Main Content

IP Core Generation Workflow with a MicroBlaze processor: Xilinx Kintex-7 KC705

This example shows how to use the HDL Coder™ IP Core Generation Workflow to develop reference designs for Xilinx® parts without an embedded ARM® processor present, but which still utilize the HDL Coder generated AXI interface to control the DUT. Specifically, this example will use the Xilinx Kintex-7 KC705 board and a MicroBlaze™ soft processor running a LightWeightIP (lwIP) based TCP/IP firmware server in the reference design to access the HDL Coder generated DUT registers from anywhere on the connected network. Further, this example will also highlight the difference between accessing data from a collection of registers implemented as multiple scalar ports and a collection of registers implemented as a single vector port.

Requirements

Xilinx Kintex-7 KC705 development board

Example Reference Designs

MicroBlaze is a simple, versatile soft-core processor that can be used in Xilinx FPGA only platforms, such as the Kintex-7, to perform the functionality a full-fledged processor, or as a flexible, programmable IP. When programs are small, the ELF can sit in BRAM and the design becomes completely self contained in the FPGA. There are many applications well suited to being implemented on a MicroBlaze. Here we list just a few:

  1. Remote networked control of a deployed IP Core algorithm

  2. Embedded web server for control and data display

  3. Integration of existing software algorithms to hardware-only platforms

The following is a system level diagram for this MicroBlaze system:

The reference design, "Xilinx MicroBlaze TCP/IP to AXI4-Lite Master", uses Vivado™ MicroBlaze IP to translate TCP/IP packets into AXI4-Lite reads and writes. Below is a block diagram of the complete system, including all the peripherals required to operate the TCP/IP server and debug via the UART serial console.

MicroBlaze Setup

In order to operate the TCP/IP server, the MicroBlaze IP needs a few basic peripherals:

  • local memory (BRAM) for data/instructions

  • Ethernet core for transmitting and receiving frames

  • UART core for sending debug messages

  • Timer core for generating timeout interrupts

  • Interrupt controller for handling interrupts from all these peripherals.

As can be seen in the address editor below, all these peripherals are connected to the MicroBlaze via AXI4 interfaces.

The amount of local memory allocated using BRAM is 1MB. This amount is needed to run the lwIP stack. The benefit of specifying BRAM for local memory is that the executable ELF can be included in the bitstream, which simplifies programming and enables targeting existing FPGA boards which may not have external DRAM memory. However, this convenience comes at a cost of increased utilization:

If BRAM utilization is a concern and DRAM resources are available, you may opt to replace local BRAM memory with external DRAM memory. See Xilinx app note "xapp1026" for more details on alternate configurations and application information.

Example Reference Design plugin_rd.m

The plugin_rd.m for this reference design is shown below:

function hRD = plugin_rd()
% Reference design definition
%   Copyright 2014-2018 The MathWorks, Inc.
% Construct reference design object
hRD = hdlcoder.ReferenceDesign('SynthesisTool', 'Xilinx Vivado');
hRD.ReferenceDesignName = 'Xilinx MicroBlaze TCP/IP to AXI4-Lite Master';
hRD.BoardName = 'Xilinx Kintex-7 KC705 development board';
% Tool information
hRD.SupportedToolVersion = {'2017.2','2017.4'};
%% Add custom design files
% add custom Vivado design
hRD.addCustomVivadoDesign( ...
  'CustomBlockDesignTcl', 'system_top.tcl',...
  'VivadoBoardPart',      'xilinx.com:kc705:part0:1.1');
% add custom files, use relative path
hRD.CustomFiles         = {'mw_lwip_tcpip_axi4.elf'};
%% Add interfaces
% add clock interface
hRD.addClockInterface( ...
  'ClockConnection',      'clk_wiz_0/clk_out1', ...
  'ResetConnection',      'proc_sys_reset_0/peripheral_aresetn',...
  'DefaultFrequencyMHz',  100,...
  'MinFrequencyMHz',      100,...
  'MaxFrequencyMHz',      100,...
  'ClockNumber',          1,...
  'ClockModuleInstance',  'clk_wiz_0');
% add AXI4 and AXI4-Lite slave interfaces
hRD.addAXI4SlaveInterface( ...
  'InterfaceConnection', 'microblaze_0_axi_periph/M04_AXI', ...
  'BaseAddress',         '0x44A00000',...
  'MasterAddressSpace',  'microblaze_0/Data',...
  'InterfaceType',       'AXI4-Lite',...
  'InterfaceID',         'MicroBlaze AXI4-Lite Interface');
hRD.HasProcessingSystem = false; % No hard processing system

Note that the reference design includes the MicroBlaze executable mw_lwip_tcpip_axi4.elf in the reference design property CustomFiles. This will copy the executable from the reference design to the Vivado project so that it can be associated with the MicroBlaze IP.

Additional code in 'system_top.tcl' to attach ELF to uBlaze

The 'system_top.tcl' file included in most reference designs is used to create the top level Vivado IP Integrator block diagram containing most of the reference design IP. Here we are adding some additional Tcl code to this file after the block diagram has been created to associate the standalone MicroBlaze ELF executable with the MicroBlaze IP.

   import_files -norecurse mw_lwip_tcpip_axi4.elf
   generate_target all [get_files system_top.bd]
   set_property SCOPED_TO_REF system_top [get_files -all -of_objects [get_fileset sources_1] {mw_lwip_tcpip_axi4.elf}]
   set_property SCOPED_TO_CELLS { microblaze_0 } [get_files -all -of_objects [get_fileset sources_1] {mw_lwip_tcpip_axi4.elf}]

Doing this allows the ELF to be packaged with the bitstream and programmed into the MicroBlaze BRAM memory at the same time as the FPGA.

Execute the IP Core Workflow

Using the above reference design you will generate an HDL IP Core that blinks LEDs on the KC705 board. You will then use tcpclient to send/receive formatted packets to the MicroBlaze to issue reads/writes over the AXI4-Lite interface to the generated HDL IP Core.

1. Add the MicroBlaze reference design files to the MATLAB path using the command:

example_root = (hdlcoder_amd_examples_root)
cd (example_root)
addpath(genpath('KC705'));

2. Set up the Xilinx Vivado tool path by using the following command:

>> hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2020.2\bin\vivado.bat');

Use your own Xilinx Vivado installation path when executing the command.

3. Open the Simulink model that implements LED blinking, as well as vector output ports for comparison to scalar ports, using the command:

open_system('hdlcoder_led_vector')

4. Launch HDL Workflow Advisor from the hdlcoder_led_vector/DUT subsystem by right-clicking the DUT subsystem, and selecting HDL Code > HDL Workflow Advisor.

5. Select reference design from the drop down in step 1.2

6. Assign register ports to the "MicroBlaze AXI4-Lite Interface". These will then be accessible at the hex offset shown in the table.

7. Run the remaining steps in the workflow to generate a bitstream and program the target device.

Determining Addresses from the IP Core Report

The Base Address for an HDL Coder IP Core is defined in the reference design plugin_rd.m with the following command:

% add AXI4 and AXI4-Lite slave interfaces
hRD.addAXI4SlaveInterface( ...
  'InterfaceConnection', 'microblaze_0_axi_periph/M04_AXI', ...
  'BaseAddress',         '0x44A00000',...
  'MasterAddressSpace',  'microblaze_0/Data',...
  'InterfaceType',       'AXI4-Lite',...
  'InterfaceID',         'MicroBlaze AXI4-Lite Interface');

For this design, the base address is 0x44A0_0000. The offsets can be found in the IP Core Report Register Address Mapping table:

Vector Data Read/Write with Strobe Synchronization

Vector data is supported on the AXI4/AXI4-Lite interfaces as of R2017a. Unlike a collection of scalar ports, all the elements of vector data are treated as synchronous to the IP Core algorithm logic. Additional strobe registers added for each vector input and output port maintain this synchronization across multiple sequential AXI4 reads/writes.

For input ports, the strobe register controls the enables on a set of shadow registers, allowing the IP core logic to see all the updated vector elements simultaneously. For output ports, the strobe register controls the synchronous capturing of vector data to be read. Below is a diagram of the synchronization logic generated with vector data:

Connect to the TCP/IP server

To start interacting with the TCP/IP server running on the MicroBlaze, first connect the UART serial console to view debug messages, which will help ensure things are working as expected. First find the serial port that is connected to the UART on the board:

Then use this port to connect using a program such as PuTTY™:

Once connected to the UART serial console, run the hdlworkflow_ProgramTargetDevice.m script to reprogram the board.

   >>hdlworkflow_ProgramTargetDevice
   ### Workflow begin.
   ### Loading settings from model.
   ### ++++++++++++++ Task Program Target Device ++++++++++++++
   ### Generated logfile: hdl_prj\hdlsrc\hdlcoder_led_vector\workflow_task_ProgramTargetDevice.log
   ### Task "Program Target Device" successful.
   ### Workflow complete.

In the console window, you should see the following header, displaying the IP address and port number the server is connected to.

NOTE: If the board is connected to a network with a DHCP server enabled, the IP information will be different than shown above. In this case, you will need to modify line 43 of the read_write_test.m script to connect to the correct IP address of the board:

t = tcpclient('192.168.1.10',7);

Sending AX4-Lite transactions to the MicroBlaze from MATLAB using tcpipclient

In order to issue reads and writes to the IP Core via TCP/IP and AXI4-Lite, the address,data and command to be performed must be encoded in the packet sent to the TCP/IP server. For this example, we use the following packet format:

               [--Address--]  32-bits
               [----Cmd----]  lower byte of 32-bit word (READ = 0, WRITE = 1, DEBUG = 2)
               [---Length---]  lower byte of 32-bit work (N<255)
               [----Data---]  32-bits, used only for WRITE cmd

For example, a packet issuing a read of 3 consecutive values starting at address 0x44a0010c:

               [44 a0 01 0c]
               [00 00 00 00]
               [00 00 00 03]

This will return data at offsets 0x10c,0x110,0x114.

For example, a packet issuing a write of 0x0 to offset 0x04, would be:

               [44 a0 00 04]
               [00 00 00 01]
               [00 00 00 01]
               [00 00 00 00]

To change the debug level used to print to the console, send a debug cmd packet:

               [xx xx xx xx]
               [00 00 00 03]
               [00 00 00 01] %0 = no msg, 1 = READ|WRITE, 2 = full pkt

Run the read_write_test.m script

This example includes a script which will setup a connection to the TCP/IP server running on the MicroBlaze, create commands to enable/disable the DUT, read 6 scalar ports and the same data as a vector port and compare the results.

1. To run this script, first copy it to your local directory

>> copyfile(fullfile('ublaze_lwip_read_write_vector_test.m'),'ublaze_test.m');

and open the script in the editor:

>> edit('ublaze_test.m');

The script has three sections. The first section, connects to the board and sets up the commands that will be used. If required, update the IP address of the board on line 41.

2. Execute section 1. You will have generated the following commands as arrays of uint32 types:

read6_cmd      = uint32([hex2dec('44a0010c') 0 6]);   %read 6 32-bit regs
read_vec_cmd   = uint32([hex2dec('44a00140') 0 6]);   %read 6 elements of vec
strobe_vec_cmd = uint32([hex2dec('44a00160') 1 1 1]); %write strobe for vec
enable_cmd     = uint32([hex2dec('44a00004') 1 1 1]); %enable ip core
disable_cmd    = uint32([hex2dec('44a00004') 1 1 0]); %disable ip core
debug0_cmd     = uint32([hex2dec('00000000') 3 0]);   %disable all debug printfs
debug1_cmd     = uint32([hex2dec('00000000') 3 1]);   %enable READ|WRITE printfs
debug2_cmd     = uint32([hex2dec('00000000') 3 2]);   %enable pkt printfs

NOTE: these arrays store data in the endian format used by the local machine, which for many x86 systems is the little endian format. However, the TCP/IP server expects values in the big endian format (network byte order). As a result, if the system you are on is little endian, the bytes in each element must be swapped using swapbytes.

3. Execute section 2 to disable the DUT logic and read a single counter value connected to the 6 scalar ports as well as all 6 elements in the vector port. Notice that all the counter values match. This is because the same data is driven to all the ports and the DUT is disabled, so the asynchronous access across the AXI4 interface is not apparent.

   Scalar port (top) vs vector port (bottom) access with DUT disabled:
      7e8aec14   7e8aec14   7e8aec14   7e8aec14   7e8aec14   7e8aec14
      7e8aec14   7e8aec14   7e8aec14   7e8aec14   7e8aec14   7e8aec14

4. Execute section 3 to re-enable the DUT logic and read the same counter values back. Notice that the 6 scalar ports all show different values, while the 6 elements of the vector port are all the same. This is due to the sequential access that must occur across the AXI4 interface and the lack of synchronization register in the scalar port case and the presence of an explicit synchronization register in the vector port case.

   Scalar port (top) vs vector port (bottom) access with DUT enabled:
      7f7796dc   7fc70e4b   8016860a   8065fdce   80b5758d   8104ed4d
      815964dd   815964dd   815964dd   815964dd   815964dd   815964dd

The corresponding debug output on the serial console will be:

Summary

This demo highlighted the use of a MicroBlaze soft-core processor in FPGA only designs. The MicroBlaze is well suited to function as a full-fledged processor or as a flexible IP running legacy C code as a firmware application. This demo also showed the difference between a collection of scalar ports and a vector port in regards to data synchronization across the AXI4 interface.

Appendix A: Creating and editing a Xilinx SDK application

This section will show how to create a new Xilinx SDK project and incorporate the code from this example to then modify or extend.

1. Open the Xilinx Vivado project by clicking the link in HDL Workflow Advisor step 4.1 "Create Project" :

2. Export the existing design, including the generated bitstream, to a local folder. Once this is done, go ahead and "Launch SDK" as well.

3. From within the SDK, create a new application project

4. You will then have the option of naming the project and creating a new bsp

Next you can choose from a few pre-configured example/template projects to get started. This example is built off of the "lwIP Echo Server" project, so select that now.

5. Using the echo server as a template, you can replace the following 3 methods with the code snippet below to modify the behavior of the server

Appendix B: Copy C file contents to project

/* Copyright 2016-2020 The MathWorks, Inc. */
#define IPCOREBASE 0x44a00000
#define WRITE  0x01
#define READ   0x00
#define DEBUG  0x03

int transfer_data() {
    return 0;
}

void print_app_header()
{
    xil_printf("\n\r\n\r-----MathWorks HDL Coder AXI4-Lite IP Core Read/Write Server ------\n\r");
    xil_printf("  TCP packets sent to port 7 will be issued as AXI4-Lite Read/Writes\n\r");
    xil_printf("\n\r");
    xil_printf("  [ 32-bit address ] (Base Address = 0x44a0_0000)\n\r");
    xil_printf("  [ 32-bit   cmd   ] (read = 0x00, write =0x01, debug = 0x03)\n\r");
    xil_printf("  [ 32-bit   len   ] ( N<255)\n\r");
    xil_printf("  [ 32-bit   data  ] (N 32-bit data values for write cmd)\n\r");
    xil_printf("------------------------------------------------------------ ------\n\r");

}

void print_packet(struct pbuf *p) {
    u16 ii;
    u8 *pktPtr;

    pktPtr = p->payload;
    xil_printf("DEBUG | packet payload:\r\n");
    for (ii=0;ii<p->len;ii+=4) {
        xil_printf("%02x %02x %02x %02x\r\n",*(pktPtr+ii),*(pktPtr+ii+1),*(pktPtr+ii+2),*(pktPtr+ii+3));
    }


}
err_t recv_callback(void *arg, struct tcp_pcb *tpcb,
                               struct pbuf *p, err_t err)
{
    u8 *pktPtr,*pktEnd;
    volatile u32 *addr;
    u32 data[255],cmd;
    u16 len;
    int ii;
    static u8 debug = 3;

    /* do not read the packet if we are not in ESTABLISHED state */
    if (!p) {
        tcp_close(tpcb);
        tcp_recv(tpcb, NULL);
        return ERR_OK;
    }

    /* indicate that the packet has been received */
    tcp_recved(tpcb, p->len);
    if (debug > 1) print_packet(p);

    //[ 32 bits address ]
    //[ 32 bits read = 0x00, write =0x01]
    //[ 32 bits length ]
    //[ 32 bits write data]
    pktPtr = p->payload;
    pktEnd = pktPtr+p->len;


    /* could be multiple commands per packet */
    while ( pktPtr < pktEnd) {
        addr = (u32*) (pktPtr[0]<<24 | pktPtr[1]<<16 | pktPtr[2]<<8 | pktPtr[3]);
        cmd  = (u32) pktPtr[7];      // cmd is 32 bits, but only 1st byte used, ignore rest
        pktPtr += 8;

        switch(cmd) {
        case WRITE :
            len  = (u32) pktPtr[3];  // len is 32 bits, but only 1st byte used, ignore rest
            pktPtr += 4;
            for (ii=0;ii<len; ii++) {
                data[0] = (u32) (pktPtr[0]<<24 | pktPtr[1]<<16 | pktPtr[2]<<8 | pktPtr[3]);
                *addr = data[0];
                if (debug > 0) xil_printf("WRITE | address: 0x%08x, data[0]: 0x%08x\r\n",addr,data[0]);
                addr++;
                pktPtr += 4;
            }
            break;
        case READ :
            len  = (u32)   pktPtr[3]; // len is 32 bits, but only 1st byte used, ignore rest
            pktPtr += 4;
            for (ii=0;ii<len; ii++) {
                data[ii] = *addr;
                if (debug > 0) xil_printf("READ  | address: 0x%08x, data[%d]: 0x%08x\r\n",addr,ii,data[ii]);
                addr++;
            }
            /* send the packet back */
            if (tcp_sndbuf(tpcb) > p->len)
                err = tcp_write(tpcb, data, 4*len, 1);
            else
                xil_printf("no space in tcp_sndbuf\n\r");
            break;
        case DEBUG:
            debug = pktPtr[3]; // only need the low byte
            pktPtr += 4;
            xil_printf("Debug level set to : 0x%02x\r\n",debug);
            break;
        default :
            xil_printf("INVALID | cmd: 0x%08x\r\n",cmd);

        }
    }
        /* free the received pbuf */
        pbuf_free(p);

        return ERR_OK;
}

6. Save the modified echo.c file and the application will be rebuilt.

Appendix C: Program the FPGA with ELF and bitstream

Now, you can program the FPGA using the exported bitstream and the newly created ELF file.