FPGA Acceleration

Hardware Architect Freedoms

The most basic area to improve performance is that Accelerator Engines either have
- 512MB memory with tRC of 2.67ns
- 1GB memory with tRC of 2.67ns
- Easily replaces many QDR devices
Signal integrity on the board
- The Accelerator Engine I/O is implemented using SerDes
- The GCI protocol allows for as few as 4 lanes to be used.
- Two independent 8 lane ports.
- The two GCI ports allow the device to be used as a Dual Port Memory
- Board layout of our Gb speed serial I/O is considerably more reliable than the many QDR or DRAM signals that need to be routed running at 350Mz or greater.
Simplifying the RTL by moving algorithms or functions into the Accelerator Engine

Software Architect Freedoms

Add 512Mb or 1Gb of memory (QDR replacement)
- To eliminate swapping data with smaller memory
- To provide fast access to common tables
Base accelerator engines have
- Fixed Burst READ or WRITE functions
  - These allow one function call to execute multiple READS and WRITES
- Fixed RMW functions
  - These allow a single RMW function call to execute a READ, a specified MODIFY, then WRITE
  - Atomic operation can also be maintained
- Using a PHE (Programmable HyperSpeed Engine) makes it possible to move algorithms or functions into the Accelerator Engine
- Using the onboard 32 Risc core processors and additional memory
- Move algorithms or functions into the Accelerator Engine
  - Complex algorithms or functions that use considerable RTL/Resources
  - Time consuming tasks
  - Repetitive tasks
- Parallel processing
  - The PHE Accelerator Engine has 32 Risc core processors
  - Up to 8 threads per processor (Total of 256 threads)
  - Install many copies of an algorithm/function for parallel processing and let the PHE handle parsing of the task to a processor

Option 1: Simple QDR Replacement – Increase Memory ANDSimplify Board Signal Routing and Integrity

1. High-Speed Serial Protocol I/O Interface

Our 16 SerDes lanes can transmit data up to 12.5Gbps, with an optional rate of 10Gbps. MoSys’ GigaChip Interface (GCI) delivers full duplex, CRC protected data throughput, enabling up to 10 Billion memory transaction per second on as few as 16 signals.

Traditional memory design requires a lot of interface pins (in some cases 1000’s of pins), making signal routing and integrity a design challenge.

Each Accelerator engine has 2 completely independent, 8 lane, I/O ports that allow simultaneous memory access operations.

Device	Memory	tRC	Latency
BE2	512Mb	2.67ns	6ns
BE3	1Gb	2.67ns	~25ns

KEEP IT SIMPLE
BUT
MAKE IT RUN FAST!

Serial I/O
- Has 2 Full-Duplex ports comprised of up to 8 SerDes lanes each
  - SerDes capable of running at 10Gbs to 25Gbs
- Can operate with as few as 4 lanes

Base Acceleration Engines include
- Fixed Burst READ and WRITE functions LEARN MORE
- Fixed RMW Function LEARN MORE

Option 2: Dual Port Memory

Each of the 8 lane I/O Ports are capable of operating independently
- Allows sharing of its memory resources

Option 3: Pipelining Data

Option 4: Accelerating FPGA Performance Using BLAZAR Accelerator Engines

Step 1: Identify FPGA Functions to Offload to the Accelerator Engine

Simplify software using fixed BURST and RMW functions included
- BURST READ or WRITE of multiple locations on single function call
- RMW READ/MODIFY/WRITE on a single function call
  - Statistical/counters
  - Atomic operations can be assured
FPGA tasks that would execute faster using the PHE using the 32 Risc cores
- Simplify
  - RTL by moving functions into the PHE
  - Provides flexibility for the System Architect to sort tasks between hardware RTL and software tasks
- Move complex algorithms/functions
  - TCAM
  - Prefix matching
  - Data analysis
  - Computational functions
  - Analytical functions
- General tasks
  - Time consuming tasks
  - Repetitive tasks
  - High RTL usage tasks
- Speed increase using
  - Parallel processing (32 cores)
  - 256 threads
  - Utilize engine scheduler to optimize execution
    - Install multiple copies of same algorithm/functions and scheduler will find available processors

Step 2: Identified Functions for Offloading

There are Multiple Reasons to Move Functions out of the FPGA

Functions Run
- Faster
- Multiple copies can be installed and executed in parallel
- Execution priority can be set
- System Flexibility
- Save cost of an ASIC

RTL Simplify
- Combine multiple functions into one user-defined higher level function
- Define user functions not able to be done in the RTL or execute fast enough.

Save FPGA Space
- Free up resources to “Do More”
- Simplify RTL by moving common or frequently called functions into PHE

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

FPGA Acceleration

Hardware Architect Freedoms

Software Architect Freedoms

Option 1: Simple QDR Replacement – Increase Memory ANDSimplify Board Signal Routing and Integrity

1. High-Speed Serial Protocol I/O Interface

Option 2: Dual Port Memory

Option 3: Pipelining Data

Option 4: Accelerating FPGA Performance Using BLAZAR Accelerator Engines

Step 1: Identify FPGA Functions to Offload to the Accelerator Engine

Step 2: Identified Functions for Offloading

There are Multiple Reasons to Move Functions out of the FPGA

Step 3: Do More…Achieve HyperSpeed!

Quick Links

PHE-Programmable HyperSpeed Engine Memory IC

Technology

Applications

Development Kits

FPGA Acceleration

Hardware Architect Freedoms

Software Architect Freedoms

Option 1: Simple QDR Replacement – Increase Memory ANDSimplify Board Signal Routing and Integrity

1. High-Speed Serial Protocol I/O Interface

Option 2: Dual Port Memory

Option 3: Pipelining Data

Option 4: Accelerating FPGA Performance Using BLAZAR Accelerator Engines

Step 1: Identify FPGA Functions to Offload to the Accelerator Engine

Step 2: Identified Functions for Offloading

There are Multiple Reasons to Move Functions out of the FPGA

Step 3: Do More…Achieve HyperSpeed!

Quick Links

PHE-Programmable HyperSpeed Engine Memory IC

Technology

Applications

Development Kits

CCPA & GDPR website cookie consent