Main Content

Run Sequence-to-Sequence Classification on FPGAs by Using Deep Learning HDL Toolbox

This example shows how to create, compile, and deploy a long short-term memory (LSTM) network trained on accelerometer data from human movement by using the Deep Learning HDL Toolbox™ Support Package for Xilinx FPGA and SoC. Use the deployed network to classify human activity based on sequence input data. Use MATLAB® to retrieve the prediction results from the target device.

The network attached to this example was trained using the Sequence-to-Sequence Classification Using Deep Learning. This example uses sensor data obtained from a smartphone worn on the body. This example deploys an LSTM network trained to recognize the activity of the wearer given time series data that represents accelerometer readings in three different directions. The graphs below show the raw data for these accelerometer readings over time and the resulting classifications. The training data contains time series data for seven people. Each sequence has three features and varies in length. The data set contains six training observations and one test observation.

ClassificationResultImage.png

Prerequisites

  • Xilinx® Zynq® Ultrascale+™ ZCU102 SoC development kit

  • Deep Learning HDL Toolbox™ Support Package for Xilinx FPGA and SoC

  • Deep Learning Toolbox™

  • Deep Learning HDL Toolbox™

Load the Pretrained Network

To load the pretrained human body movement network, enter:

load SequenceToSequenceClassification

View the layers of the network by using the analyzeNetwork function. The function returns a graphical representation of the network and detailed parameter settings of the layers in the network.

analyzeNetwork(net)

Define FPGA Board Interface

Define the target FPGA board programming interface by using the dlhdl.Target object. Specify that the interface is for a Xilinx board with an Ethernet interface.

To create the target object, enter:

hTarget = dlhdl.Target('Xilinx','Interface','Ethernet');

To use the JTAG interface, install Xilinx™ Vivado™ Design Suite 2023.1. To set the Xilinx Vivado tool path, enter:

hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2023.1\bin\vivado.bat');

Prepare Network for Deployment

Prepare the network for deployment by creating a dlhdl.Workflow object. Specify the network and bitstream name. Ensure that the bitstream name matches the data type and FPGA board. In this example the target FPGA board is the Xilinx ZCU102 SOC board. The bitstream uses a single data type.

hW = dlhdl.Workflow('network', net, 'Bitstream', 'zcu102_lstm_single','Target',hTarget);

To run the example in a Xilinx ZC706 board, enter:

hW = dlhdl.Workflow('Network', snet, 'Bitstream', 'zc706_lstm_single','Target',hTarget);

Compile Network

Run the compile method of the dlhdl.Workflow object to compile the network and generate the instructions, weights, and biases for deployment. The total number of frames exceeds the default value of 30. Set the InputFrameNumberLimit name-value argument to 10000 to run predictions in chunks of 10,000 frames to prevent timeouts.

dn = compile(hW,'InputFrameNumberLimit',10000)
### Compiling network for Deep Learning FPGA prototyping ...
### Targeting FPGA bitstream zcu102_lstm_single.
### The network includes the following layers:
     1   'sequenceinput'   Sequence Input          Sequence input with 3 dimensions                   (SW Layer)
     2   'lstm'            LSTM                    LSTM with 200 hidden units                         (HW Layer)
     3   'fc'              Fully Connected         5 fully connected layer                            (HW Layer)
     4   'softmax'         Softmax                 softmax                                            (SW Layer)
     5   'classoutput'     Classification Output   crossentropyex with 'Dancing' and 4 other classes  (SW Layer)
                                                                                                    
### Notice: The layer 'sequenceinput' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.
### Compiling layer group: lstm.wi ...
### Compiling layer group: lstm.wi ... complete.
### Compiling layer group: lstm.wo ...
### Compiling layer group: lstm.wo ... complete.
### Compiling layer group: lstm.wg ...
### Compiling layer group: lstm.wg ... complete.
### Compiling layer group: lstm.wf ...
### Compiling layer group: lstm.wf ... complete.
### Compiling layer group: fc ...
### Compiling layer group: fc ... complete.

### Allocating external memory buffers:

          offset_name          offset_address     allocated_space  
    _______________________    ______________    __________________

    "InputDataOffset"           "0x00000000"     "160.0 kB"        
    "OutputResultOffset"        "0x00028000"     "316.0 kB"        
    "SchedulerDataOffset"       "0x00077000"     "616.0 kB"        
    "SystemBufferOffset"        "0x00111000"     "20.0 kB"         
    "InstructionDataOffset"     "0x00116000"     "4.0 kB"          
    "FCWeightDataOffset"        "0x00117000"     "648.0 kB"        
    "EndOffset"                 "0x001b9000"     "Total: 1764.0 kB"

### Network compilation complete.
dn = struct with fields:
             weights: [1×1 struct]
        instructions: [1×1 struct]
           registers: [1×1 struct]
    syncInstructions: [1×1 struct]
        constantData: {}
             ddrInfo: [1×1 struct]
       resourceTable: [6×2 table]

Program Bitstream onto FPGA and Download Network Weights

To deploy the network on the Xilinx ZCU102 SoC hardware, run the deploy method of the dlhdl.Workflow object. This function uses the output of the compile function to program the FPGA board and download the network weights and biases. The deploy function starts programming the FPGA device and displays progress messages, and the required time to deploy the network.

 deploy(hW)
### FPGA bitstream programming has been skipped as the same bitstream is already loaded on the target FPGA.
### Resetting network state.
### Loading weights to FC Processor.
### FC Weights loaded. Current time is 10-Dec-2023 17:21:23

Load Human Activity Test Data

Load the test data and classify the activity at each time step. Each sequence has three features and varies in length. The three features correspond to the accelerometer readings in three different directions.

Load the human activity test data. XTest contains a single sequence of dimension 3. YTest contains a sequence of categorical labels that correspond to the activity at each time step.

load HumanActivityTest
numFeatures = 3;
figure
plot(XTest{1}')
xlabel("Time Step")
legend("Feature " + (1:numFeatures))
title("Test Data")

Run the Prediction

Classify the test data by using the classify function.

YPred = classify(hW.Network, XTest{1});

Calculate the accuracy of the prediction.

acc = sum(YPred == YTest{1})./numel(YTest{1})
acc = 0.9995

Compare the predictions with the test data by using a plot.

figure
plot(YPred,'.-')
hold on
plot(YTest{1})
hold off

xlabel("Time Step")
ylabel("Activity")
title("Predicted Activities")
legend(["Predicted" "Test Data"])

Compare this graph to the output of the predict method.

Run the predict method of the dlhdl.Workflow object, to retrieve the hardware prediction results.

predictions = hW.predict(XTest{1}(:,1:10000),Profile='on');
### Resetting network state.
### Finished writing input activations.
### Running a sequence of length 10000.


              Deep Learning Processor Profiler Performance Results

                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                      76879                  0.00035                   10000          772126470           2849.3
    memSeparator_0              85                  0.00000 
    memSeparator_3             237                  0.00000 
    lstm.wi                  17918                  0.00008 
    lstm.wo                  18017                  0.00008 
    lstm.wg                  17997                  0.00008 
    lstm.wf                  18017                  0.00008 
    lstm.sigmoid_1             265                  0.00000 
    lstm.sigmoid_3             267                  0.00000 
    lstm.tanh_1                307                  0.00000 
    lstm.sigmoid_2             267                  0.00000 
    lstm.multiplication_2       427                  0.00000 
    lstm.multiplication_1       427                  0.00000 
    lstm.c_add                 421                  0.00000 
    lstm.tanh_2                301                  0.00000 
    memSeparator_2             227                  0.00000 
    lstm.multiplication_3       427                  0.00000 
    fc                        1061                  0.00000 
    memSeparator_1             211                  0.00000 
 * The clock frequency of the DL processor is: 220MHz
predictions = horzcat(predictions, hW.predict(XTest{1}(:,10001:20000)));
### Resetting network state.
### Finished writing input activations.
### Running a sequence of length 10000.
predictions = horzcat(predictions, hW.predict(XTest{1}(:,20001:30000)));
### Resetting network state.
### Finished writing input activations.
### Running a sequence of length 10000.
predictions = horzcat(predictions, hW.predict(XTest{1}(:,30001:40000)));
### Resetting network state.
### Finished writing input activations.
### Running a sequence of length 10000.
predictions = horzcat(predictions, hW.predict(XTest{1}(:,40001:50000)));
### Resetting network state.
### Finished writing input activations.
### Running a sequence of length 10000.
predictions = horzcat(predictions, hW.predict(XTest{1}(:,50001:end)));
### Resetting network state.
### Finished writing input activations.
### Running a sequence of length 3888.
save("hardwarepredictions.mat","predictions")
indices = [];
actions = [];
for x = 1:length(YPred)
    [r,i] = max(predictions(:,x));
    indices = [indices i];
    switch i 
        case 1
            actions = [actions categorical("Dancing")];
        case 2 
            actions = [actions categorical("Running")];
        case 5
            actions = [actions categorical("Walking")];
        case 4
            actions = [actions categorical("Standing")];
        case 3
            actions = [actions categorical("Sitting")];
    end
end

Plot the comparison between the FPGA board predictions and test data.

figure
plot(actions,'.-')
hold on
plot(YTest{1})
hold off

xlabel("Time Step")
ylabel("Activity")
title("Predicted Activities")
legend(["Predicted" "Test Data"])

The hardware-predicted activities are similar to the activities classified by the classify function.

See Also

| | | | | |

Related Topics