Save 2-8 System Operations with In-Memory Functions (Part 1 BURST)Tuesday May 5, 2020
By Mark Baumann
Director, Product Definition & Applications
From the very beginning of the MoSys Accelerator Engine family, the goals always included both developing a useful and beneficial device to support system acceleration and ease-of-use. These resulted in the addition of two very basic functions into the feature set of the Accelerator Engines, which are part of the In-Memory-Functions available on the Accelerator Engine devices. These functions were:
- Burst – The ability to burst up to 8 — 72-bit words on a single command
- Built in ALU – Allowing the user to take advantage of a “fire-and-forget” ability to perform actions such as maintenance of statistics, an aging function on table entries, or even handle metering (with two bucket three color capability)
It is understood that these functions are very commonly utilized or are desirable but cost too much in overhead of logic or time to implement so that they get minimized or traded for more highly valued functions by the system architects. This allows a potential opening as a differentiator, if the cost to implement and support is reasonable.
Let’s take a look at each of these functions individually.
If an application requires pure buffering of data or an over-subscription buffer, then there is a need to move data at line speed ( that being 100Gbps or 200Gbps or even up to 400Gbps) and all that is really happening with the data is to accept it at line rate maybe park it for a while then source it back again at line rates. This is most likely a case where no query is performed on the data itself, the system just needs to accept the data for short bursts while the rest of the system “catches-up”.
This represents a very common need for many systems since in most cases a system does not want to over-provision a system for short bursts of data but rather appropriately provision for expected data patterns. Therefore, providing for short term high traffic bursts can help to smooth out the issues caused by traffic bursts.
To support this movement of data and even provide the means to smooth out the traffic bursts, the MoSys Accelerator Engines allow for data to be burst both in and out of the device. What this allows the system designer to do is have a single point at which the smoothing can be accomplished. The Accelerator Engines support data transfers of words that are up to 72 bits wide. In a normal single word transfer, a read or write command is issued and the data word (in this case 72 bits) is also sent so the command bus bandwidth is equivalent to the data bus bandwidth that is a one-for-one command data issuance rate.
With MoSys Accelerator Engines, the efficiency of transfers is increased by allowing one command issue to support up to 8 data transfers so the bandwidth on the command bus is 1/8 that of the data bus. This becomes increasingly important when the bus is shared command/data as is the case with the Accelerator Engines. The interfaces are utilizing SerDes to support lower device pin-count and an easy growth path for the future.
The result of these functional additions is an increased efficiency on the SerDes pins and transfer of data by saving bandwidth on the bus that is a shared data/command structure.
The following is an example of a system that can easily support 100Gbps transfers and has an easy growth path to 160Gbps using only 12.5G SerDes on the BE-2. The FPGA is performing all the frame processing and the BE-2 is providing a temporary storage until processing is complete all at line rate.
- 100 Gbps ingress and 100 Gbps egress
- 16 lanes at 12.5 G
- 180 Gbps duplex >> 100G
- Dedicated Rx and Tx
- No bus turn penalty
- Continuous data streaming
- Burst Function
- Amortize up to 8 data words on one command
- 16 lanes at 12.5 G
The following chart reflects the throughput that the Accelerator Engine products support while utilizing the Burst feature. This is in comparison to the throughput of other common memories used in networking. As an added benefit of the Bandwidth Engine devices, a user can easily tune to the performance needed through utilizing only the needed number of SerDer lanes and or by selecting the burst size of each transfer.
- Effective through put of payload; 72b per word
- BL#= Burst length; linear burst of 2, 4 or 8 words
- Full duplex; balanced read and write
Up until now, we have been discussing the benefits that burst brings to transfers of larger blocks of data. In addition to this benefit, the Accelerator engines also bring a benefit to smaller transfers. If you consider even a transfer of 144 bits or a burst of two 72 bit words, a standard memory transfer would use two write cycle commands and two data transfers. With a simple burst two option, you issue one command and two data cycles. Saving one-quarter of the transfers. This may not seem overly important, but it can have a benefit to systems that are counting every transfer cycle to help save system performance and power.
We hope this illustrates why a simple change, such as the addition of a burst function can support your overall system performance. In the next installment, Part 2, we will look at a feature, R-M-W can also support increased system throughput and performance.
If you are looking for more technical information or need to discuss your technical challenges with an expert, we are happy to help. Email us and we will arrange to have one of our technical specialists speak with you. You can also sign up for updates. Finally, please follow us on social media so we can keep in touch.