FutureSDR 2

03 February 2021 SDR / Rust / FPGA

This is the second post on FutureSDR, an experimental async SDR runtime, implemented in Rust. While the previous post gave a high-level overview of the core concepts, this and the following posts will be about more specific topics. Since I just finished integration of AXI DMA custom buffers for FPGA acceleration, we’ll discuss this.

...and custom buffers for FPGA acceleration (Xilinx AXI DMA). Both blocking and async mode :-) https://t.co/1vmxNMwiuv pic.twitter.com/tms5IzdhdX
— Bastian Bloessl (@bastibl) January 27, 2021

The goal of the post is threefold:

Show that the custom buffers API, presented in the previous post, can easily be generalized to different types of accelerators.
Show that FutureSDR can integrate the various interfaces and mechanisms to communicate between the FPGA and the CPU.
Provide a complete walk-through of a minimal example that demonstrates how FPGAs can be integrated into SDR applications. (I didn’t find anything on the this topic and hope that this shows the big picture and helps to get started.)

The actual example is simple. We’ll use the FPGA to add 123 to an 32-bit integer. The data will be DMA’ed to the FPGA, which will stream the integers through the adder and copy the data back to CPU memory. It’s only slightly more interesting than a loopback example, but the actual FPGA implementation is, of course, not the point. There are drop-in replacements for FIR filters, FFTs, and similar DSP cores.

I only have a Xilinx ZCU106 board with a Zynq UltraScale+ MPSoC. But the example doesn’t use any fancy features and should work on any Zynq platform.

Vivado HLS

Since I don’t know any hardware description languages (HDLs, like Verilog or VHDL), I used Vivado HLS to create the adder block. HLS allows to synthesize a subset of C/C++ and comes with helpful libraries for bus interfaces and FPGA-specific data types. I created a new project that only configures the target board, added an adder.cpp source file, and implement a adder() function:

#include "ap_axi_sdata.h"

void adder(ap_axis<32, 0, 0, 0> &in, ap_axis<32, 0, 0, 0> &out){
#pragma HLS INTERFACE axis port=in
#pragma HLS INTERFACE axis port=out
#pragma HLS INTERFACE ap_ctrl_none port=return

    out.data = in.data.to_int() + 123;
    out.keep = in.keep;
    out.strb = in.strb;
    out.last = in.last;
}

Its input and output are AXI Streams of 32-bit integers ap_axis. ap_ stands for arbitrary precision integers. (In contrast to the CPU, the FPGA is not tight to 8/16/32/64-bit integers, but can work with any bit-width.) The axis extension, in turn, indicates and AXI stream. AXI is a set of bus protocols that can be is used to connect IP blocks or other components on a chip. In this case, we use the stream interface to process each integer one-by-one. Since AXI is a standardized protocol, we can directly connect our adder to the DMA controller without having to reinvent the wheel.

While the interface type would, in this case, be clear from the function arguments, the interface has to be specified explicitly through directives. This is done through the Directives tab on the right. The directives can be stored as metadata of the project or added to the source. As you can see, I opted for the latter. The additional ap_ctrl_none directive disables optional ports that are mainly used for initialization, which is not needed in this case.

The example also forwards basic control signals of the AXI stream (keep, strb, and last). My understanding is that these signals are, in this example, not strictly required (according to the AXI spec), but the AXI DMA block expects them to be present.

Before we can synthesize the adder, we have to specify the top-level function. Since we defined only one function, it would be obvious, but we, nevertheless, have to set it at Project -> Project Settings -> Synthesis -> Browse. To add the block to an FPGA design, we have to (1) synthesize and (2) package it for use in Vivado (see screenshot).

Vivado IP Integration

Having implemented the function that we want to accelerate, we are ready to create the FPGA design in Vivado. I created an empty example project for the board that just includes the processing system (PS, i.e., the ARM processor).

Based on this, we can implement a loopback example with two DMA blocks. It should look like this:

This requires us to:

add two AXI DMAs
- configure one only with a read channel and the other one only with a write channel
- disable scatter gather
- set buffer length register and address width to max
- connect the M_AXI_MM2S output of the reader DMA and the M_AXI_S2MM output of the writer DMA to AXI inputs of the Zynq. The example connects them directly, which requires adapting bus settings manually. In this case the DMAs use a data width of 32-bit, which also has to be configured for the inputs of the Zynq (double-click the block -> PS-PL Configuration -> PS-PL Interfaces -> Slave Interfaces, then search for the port and switch to 32-bit). Alternatively, we could add another interconnect between DMA and PS, which would adapt bus parameters automatically.
- connect the M_AXIS_MM2S output of the reader DMA to S_AXIS_S2MM input of the writer DMA. This closes the loop. This is the orange connection in the figure.
add an AXI Interconnect, configure it for one input and two outputs, connect the input to a master output of the Zynq, connect the outputs to the AXI Lite inputs of the DMAs
add a Concat block with two inputs and one output that combine the interrupt outputs of the DMAs and feed it to the pl_ps_irq0 input of the Zynq
run connection automation on the remaining ports
go to the Address Editor tab and auto-assign all addresses.