Accelerating FPGA Applications, Reducing Costs and Quicker Design Time

  • MoSys has been developing memory-based products for close to 20 years. It started with the development of the 1-T (transistor) memory IP. The 1 T-SRAM, has access speeds close to SRAM but support a density that approaches that of a RLDRAM and has revolutionized the memory industry. It is still being used by many companies in their products today.
  • Most memory manufacturers have focused on memory features of speed or density with a belief that one size fits all. They have not taken into consideration that memories could speed up applications by adding a level of intelligence.
  • Most memory manufacturers have focused on memory features of speed or density with a belief that one size fits all. They have not taken into consideration that memories could speed up applications by adding a level of intelligence.
  • The result, a family of devices call Accelerator Engines. This is a new class of memory called Accelerator Engines. 
  • Each Accelerator Engine is a combination of four capabilities:
    • High Capacity Memory of 576Mb  or 1.152Gb, with a tRC  of 2.67ns
    • High Speed Embedded In-Memory Functions (Intelligence) which include BURST and RMW
    • User Define In Memory Functions using Embedded 32 RISC cores
    • High Speed Serial Interface for High Bandwidth and Simplified Board Layout 
  • MoSys Supplied FPGA RTL Memory Controller. Handles All Serial Communication and Provides a Parallel QDR Like Interface
  • The Embedded In-Memory Functions of BURST and RMW are designed to execute much faster as in-memory, than could be executed in traditional memory. For the highest acceleration possible, common or complex function can be moved into an Accelerator Engine with 32 RISC cores for HyperSpeed performance.

Key Features


The Accelerator Engine memory evolution is a result of the development of the MoSys Blazar Accelerator Engine Family

Diagram of the key components of the MoSys Blazar Accelerator Engine Family.

 

General Application Selector Guide

The Accelerator Engine Memory IC family includes:

  • QPR4 (Quad Partition Rate) 0.5 Gb
  • QPR8 (Quad Partition Rate) 1 Gb
  • Bandwidth Engine 2 BURST (BE2-BURST)
  • Bandwidth Engine 3 BURST (BE3-BURST)
  • Bandwidth Engine 2 RMW (BE2-RMW)
  • Bandwidth Engine 3 RMW (BE3-RMW)
  • Programable HyperSpeed Engine (PHE)

Block Diagrams of Memory Architecture and Capacity

BE2-BURST with 576Mb
BE2-RMW with 576Mb
BE3-BURST with 1.152Gb
BE3-RMW with 1.152Gb


QDR Parallel vs MoSys Serial Comparison

Memory Capacity

  • From 576Mb to 1Gb

Costs

  • 1/3 cost per Mbit

Design

  • <10% of required QDR pins
  • Faster board layout
  • Less Power

Overall benefits

  • Higher performance
  • Higher memory capacity
  • Easy to design
  • Quicker time to market

In Memory Functions: Intelligent Acceleration Functions

  • By using the In-Memory functions applications:
    • Are accelerated beyond the limitations of memory access speeds.
    • Acceleration is achieved by the In-Memory function executing IN the memory chip (Accelerator Engine) which reduces the number external system commands needed to accomplish the same task.
  • Adding Accelerator Engine ICs as part of your overall memory strategy enables your applications to run faster and more efficiently. This level of performance is achieved by leveraging MoSys’ heritage of superior memory architecture, high-speed SerDes input/output transmission, and advanced IMF (In Memory Function) Technology.
  • By using the In Memory functions you can accelerate application beyond just memory speeds. The In Memory function is executed without needing external intervention to the memory, which reduces the number of system operations needed outside the memory to accomplish the same task. Again, freeing up more system operation time for other application tasks.
  • Simplify your RTL
  • MoSys has defined three groups of Embedded In-Memory Functions (EIMFs):
    • BURST FUNCTIONS… For high speed sequential read and write operation for data movement    (LEARN MORE)
    • RMW (Read/Modify/Write) FUNCTIONS… For in device data modification and decision  (LEARN MORE)
    • USER DEFINED FUNCTIONS… Such as common task or complex algorithms  (LEARN MORE)
  • Each group delivers different increments of performance acceleration. Which functions are embedded and whether is has 576Mb or 1.52Gb defines a MoSys Accelerator Engine IC.


In Memory Function – BURST Functions

  • Focused on DATA MOVEMENT to accelerate getting data in and out of the memory faster and more efficiently by reducing the number of command cycles.
  • A typical BURSTS In Memory function allows the system to read and/or write sequential memory location by only giving the starting address and then specifying either 2, 4 or 8 location access.
  • The BURST Read/Write In-Memory Functions can combine up to 8 READS and 8 WRITES into a single BURST command.
  • Tripling the amount of date by reducing the number of command cycles
  • BURST Functions can execute simultaneously, further increasing system performance

Example of BURST In-Memory Execution

In Memory Functions – RMW Functions

  • Focused on DATA COMPUTING AND DECISION where there is need for memory location modification involving RMW in applications such as metering, as well a single or dual counter update for statistics.
  • There are over 27 operations available such as add, subtract, compare, increment, etc.
  • The RMW function is done in one command, where traditional memory require 3 commands
    • A location modification requires first, one command to READ a memory location, a second command to MODIFY the value, and a third command to WRITE the new value back to the memory location.
  • The RMW Functions provide at least two levels of speed acceleration.
    • First, the RMW functions can be executed with a single command.
    • Second, since the modification is executed within memory, there is no need to move the data out to be modified, and then back into memory to write. This removes all of the associated I/O latency.

Example of RMW In-Memory Execution


In Memory Functions – USER DEFINED Functions

The Programmable HyperSpeed Accelerator Engine (PHE) has 32 RISC Cores and allows many options for Acceleration by firmware in the device:

Moving functions and operations into the PHE:

  • Functions in the FPGA RTL
    • Commonly used functions
    • Standard and application unique algorithms
    • Special functions
    • Frees up RTL in an FPGA
  • Function currently in Software Application to significantly impact performance
  • The general capability of the 32 cores
    • Powerful RISC Instruction Set that include instructs for hashing etc.
    • Allows parallel processing
    • Same function can be installed several times (up to 32 times) and run simultaneously
    • Up to 256 threads
    • Other optional powerful features

Examples of User Defined In Memory Functions

Using the 32 RISC Cores…Think Creatively!

  • User Defined Functions are specialized for a user’s application but here are some possible functions:
    • Bayesian
    • Random Forest of Trees
    • Repetitive data modification
    • Data Analysis
    • Image translation/editing functions
    • High speed buffer data analysis
    • Etc.
  • What functions or algorithms do you need to:
    • Run faster?
    • Move from FPGA to free up resources?
    • Move from CPU Software to MoSys Firmware for speed
    • Take advantage of speed boost by parallel operation (Multi-thread option)

Programmable HyperSpeed Engine (PHE)

32 RISC CORE Architecture for User Defined Functions


Advanced Memory Application Use

Dual Port Memory Interface Application

  • Each Accelerator Engine Memory has two 8 lane serial Ports.
  • Each port has 16 Data Lane which is 32 signals.
  • Each Accelerator Port operates as a true Dual Port with completely independent and simultaneously access
  • In addition the MoSys devices have auto-adaptation which handles on-board signal tuning, eliminating the need for any external components to insure a clean, reliable signal.

Super High Bandwidth Interface

  • Each Accelerator Engine Memory has two 8 lane serial Ports.
  • Each port has 16 Data Lane which is 32 signals.
  • For extremely high bandwidth requirements, these two ports can be combined as one super high bandwidth port.
  • In addition the MoSys devices have auto-adaptation which handles on-board signal tuning, eliminating the need for any external components to insure a clean, reliable signal.