An Overview of Graph Memory Engine Technology –Part 2

By Michael Miller

Chief Technology Officer

MoSys, Inc.

MoSys has developed Application-Specific Accelerator Platforms for growing applications that require functions such as packet filtering, packet forwarding and data analytics. Each platform includes an application-specific Virtual Accelerator Engine and software for programing the engine. The software works as an adaption layer between new and existing software environments and the Virtual Accelerated Engines. The Graph Memory Engine (GME) represents a specific memory of the family of Virtual Accelerator Engines developed to accelerate the processing of data structures. This can be represented as graphs. Part 1 of this blog focused on the execution of GME technology. Part 2 will feature API and GME implementations. And Part 3 will cover RLT, GME interface and platform details.

Common API

Each VAE has its own API which is specific to the application. For the GME, the key concept is Nodes and Edges. The API is common to all implementations of the GME whether it is in software or hardware like FPGA or some combination of FPGA and MoSys Acceleration Engines such as PHE. The Common API then can be used a direct interface for new software to build graphs in the GME. Alternatively, the API can be used by a layer called the Adaptation Layer which maps from other existing APIs into the GME API.

A screenshot of a cell phone

Description automatically generated

The API is made up of a library of routines which are designed to track all of the nodes, edges and operations. They then translate them into the GM of the GME via whatever set of drivers and hardware are required to connect to the actual memory of the particular GME implementation.

Updates to the GM should be considered to be atomic at the add_edge & delete_edge level. The programmer should take care to at higher levels of granularity to manage atomicity by building sub graphs and finally linking them in with one last update that makes the connection of the subgraph into the larger graph. Subsequently, the subgraph can be eliminated from the active portion of “execution” by removing all links. Upon the removal of the last link, a subgraph then becomes dormant and ready for elimination. The GME will not do automatic garbage collection. It is up to the higher-level code to eliminate the dormant nodes. 

 Graph Memory Engine Implementations

The implementations can be broken into two categories of software running on CPU cores and RTL Modules for implementation as IP for FPGAs or ASIC. The RTL modules can have further elaborations with and without connections to the MoSys Accelerator Engine Silicon such as BE2/BE3 or PHE through GCI. All of the RTL modules will have the same RTL Module Interface.

Each of the implementations will be cycle node accurate within the constraints that, given the same graph and input vector, the same nodes will be visited and the same results will appear. There is no guarantee that results will come out in the same order. For instance, in the RTL versions, results will be tagged with an input thread ID. The Software implementation has no notion of the thread ID. Given that the same nodes will be visited helps the designer predict the performance from one implementation to the next.  However, each implementation will have limitations imposed by clock speed, memory access rate and queuing limitations.

Additional Resources:

Part 1 of this blog focused on the execution of GME technology. Part 2 featured API and GME implementations. And Part 3 will cover RLT, GME interface and platform details.

If you are looking for more technical information or need to discuss your technical challenges with an expert, we are happy to help. Email us and we will arrange to have one of our technical specialists speak with you.  You can also sign up for updates. Finally, please follow us on social media so we can keep in touch.

Share on social media