Save 2-8 System Operations with In-Memory Functions (Part 1 BURST)

Tuesday May 5, 2020

By Mark Baumann

Director, Product Definition & Applications

MoSys, Inc.

From the very beginning of the MoSys Accelerator Engine family, the goals always included both developing a useful and beneficial device to support system acceleration and ease-of-use. These resulted in the addition of two very basic functions into the feature set of the Accelerator Engines, which are part of the In-Memory-Functions available on the Accelerator Engine devices. These functions were:

Burst – The ability to burst up to 8 — 72-bit words on a single command
Built in ALU – Allowing the user to take advantage of a “fire-and-forget” ability to perform actions such as maintenance of statistics, an aging function on table entries, or even handle metering (with two bucket three color capability)

It is understood that these functions are very commonly utilized or are desirable but cost too much in overhead of logic or time to implement so that they get minimized or traded for more highly valued functions by the system architects. This allows a potential opening as a differentiator, if the cost to implement and support is reasonable.

Let’s take a look at each of these functions individually.

BURST

If an application requires pure buffering of data or an over-subscription buffer, then there is a need to move data at line speed ( that being 100Gbps or 200Gbps or even up to 400Gbps) and all that is really happening with the data is to accept it at line rate maybe park it for a while then source it back again at line rates. This is most likely a case where no query is performed on the data itself, the system just needs to accept the data for short bursts while the rest of the system “catches-up”.

This represents a very common need for many systems since in most cases a system does not want to over-provision a system for short bursts of data but rather appropriately provision for expected data patterns. Therefore, providing for short term high traffic bursts can help to smooth out the issues caused by traffic bursts.

To support this movement of data and even provide the means to smooth out the traffic bursts, the MoSys Accelerator Engines allow for data to be burst both in and out of the device. What this allows the system designer to do is have a single point at which the smoothing can be accomplished. The Accelerator Engines support data transfers of words that are up to 72 bits wide. In a normal single word transfer, a read or write command is issued and the data word (in this case 72 bits) is also sent so the command bus bandwidth is equivalent to the data bus bandwidth that is a one-for-one command data issuance rate.

With MoSys Accelerator Engines, the efficiency of transfers is increased by allowing one command issue to support up to 8 data transfers so the bandwidth on the command bus is 1/8 that of the data bus. This becomes increasingly important when the bus is shared command/data as is the case with the Accelerator Engines. The interfaces are utilizing SerDes to support lower device pin-count and an easy growth path for the future.

The result of these functional additions is an increased efficiency on the SerDes pins and transfer of data by saving bandwidth on the bus that is a shared data/command structure.

The following is an example of a system that can easily support 100Gbps transfers and has an easy growth path to 160Gbps using only 12.5G SerDes on the BE-2. The FPGA is performing all the frame processing and the BE-2 is providing a temporary storage until processing is complete all at line rate.

100 Gbps ingress and 100 Gbps egress
Leverage:
- 16 lanes at 12.5 G
  - 180 Gbps duplex >> 100G
- Dedicated Rx and Tx
  - No bus turn penalty
  - Continuous data streaming
- Burst Function
  - Amortize up to 8 data words on one command

The following chart reflects the throughput that the Accelerator Engine products support while utilizing the Burst feature. This is in comparison to the throughput of other common memories used in networking. As an added benefit of the Bandwidth Engine devices, a user can easily tune to the performance needed through utilizing only the needed number of SerDer lanes and or by selecting the burst size of each transfer.

Effective through put of payload; 72b per word
- BL#= Burst length; linear burst of 2, 4 or 8 words
- Full duplex; balanced read and write

Up until now, we have been discussing the benefits that burst brings to transfers of larger blocks of data. In addition to this benefit, the Accelerator engines also bring a benefit to smaller transfers. If you consider even a transfer of 144 bits or a burst of two 72 bit words, a standard memory transfer would use two write cycle commands and two data transfers. With a simple burst two option, you issue one command and two data cycles. Saving one-quarter of the transfers. This may not seem overly important, but it can have a benefit to systems that are counting every transfer cycle to help save system performance and power.

We hope this illustrates why a simple change, such as the addition of a burst function can support your overall system performance. In the next installment, Part 2, we will look at a feature, R-M-W can also support increased system throughput and performance.

Additional Resources:

Hot Chips 2017 Xilinx 16nm Datacenter Device Family with In-Package HBM and CCIX Interconnect

Intel’s Interconnected Future: Combining Chiplets, EMIB

Speeding Up Your FPGA/ASIC Memory Interface (GCI)

FPGA and Memory Latency

The Benefits of Serial Memory

If you are looking for more technical information or need to discuss your technical challenges with an expert, we are happy to help. Email us and we will arrange to have one of our technical specialists speak with you. You can also sign up for updates. Finally, please follow us on social media so we can keep in touch.

Share on social media

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Save 2-8 System Operations with In-Memory Functions (Part 1 BURST)

Other Blog Posts

Utilizing Synchronous Ethernet Timing in 10GbE Breakout from a 100G Multi-Link Gearbox (MLG)

Giving Your FPGA a Big Boost

Not All Memory Chips Are Created Equal

Dense 10GbE Breakout from a 100G (4x25G) Port Using Multi-Link Gearbox

The Benefits of a Dual Port in Data Acquisition

Expand Your Access to High-Speed Memory on an FPGA

Tackling the Test & Measurement Market

How Board Design Can Expedite Your Next Design Project

Save 2-8 System Operations with In-Memory Functions (Part 1 BURST)

Other Blog Posts

CCPA & GDPR website cookie consent