Deploy Transfer Learning Network for Lane Detection
This example shows how to create, compile, and deploy a
dlhdl.Workflow object that has a convolutional neural network. The network can detect and output lane marker boundaries as the network object using the Deep Learning HDL Toolbox™ Support Package for Xilinx FPGA and SoC. Use MATLAB® to retrieve the prediction results from the target device.
Xilinx ZCU102 SoC development kit
Deep Learning HDL Toolbox™ Support Package for Xilinx FPGA and SoC
Deep Learning Toolbox™
Deep Learning HDL Toolbox™
Load the Pretrained SeriesNetwork
To load the pretrained series network lanenet, enter:
snet = getLaneDetectionNetwork();
Normalize the Input Layer
To normalize the input layer by modifying its type, enter:
inputlayer = imageInputLayer(snet.Layers(1).InputSize, 'Normalization','none'); snet = SeriesNetwork([inputlayer; snet.Layers(2:end)]);
To view the layers of the pretrained series network, enter:
analyzeNetwork(snet) % The saved network contains 23 layers including input, convolution, ReLU, cross channel normalization, % max pool, fully connected, and the regression output layers.
Create Target Object
Create a target object that has a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG AND Ethernet.
hTarget = dlhdl.Target('Xilinx','Interface','Ethernet');
Create WorkFlow Object
Create an object of the
dlhdl.Workflow class. When you create the class, specify the network and the bitstream name. Specify the saved pretrained lanenet neural network, snet, as the network. Make sure that the bitstream name matches the data type and the FPGA board that you are targeting. In this example the target FPGA board is the Xilinx ZCU102 SOC board. The bitstream uses a single data type.
hW = dlhdl.Workflow('network', snet, 'Bitstream', 'zcu102_single','Target',hTarget); % If running on Xilinx ZC706 board, instead of the above command, % uncomment the command below. % % hW = dlhdl.Workflow('Network', snet, 'Bitstream', 'zc706_single','Target',hTarget);
Compile the Lanenet series Network
To compile the lanenet series network, run the compile function of the
dn = hW.compile;
offset_name offset_address allocated_space _______________________ ______________ _________________ "InputDataOffset" "0x00000000" "24.0 MB" "OutputResultOffset" "0x01800000" "4.0 MB" "SystemBufferOffset" "0x01c00000" "28.0 MB" "InstructionDataOffset" "0x03800000" "4.0 MB" "ConvWeightDataOffset" "0x03c00000" "16.0 MB" "FCWeightDataOffset" "0x04c00000" "148.0 MB" "EndOffset" "0x0e000000" "Total: 224.0 MB"
Program Bitstream onto FPGA and Download Network Weights
To deploy the network on the Xilinx ZCU102 SoC hardware, run the deploy function of the
dlhdl.Workflow object. This function uses the output of the compile function to program the FPGA board by using the programming file. It also downloads the network weights and biases. The deploy function starts programming the FPGA device, displays progress messages, and the time it takes to deploy the network.
### FPGA bitstream programming has been skipped as the same bitstream is already loaded on the target FPGA. ### Loading weights to FC Processor. ### 13% finished, current time is 28-Jun-2020 12:36:09. ### 25% finished, current time is 28-Jun-2020 12:36:10. ### 38% finished, current time is 28-Jun-2020 12:36:11. ### 50% finished, current time is 28-Jun-2020 12:36:12. ### 63% finished, current time is 28-Jun-2020 12:36:13. ### 75% finished, current time is 28-Jun-2020 12:36:14. ### 88% finished, current time is 28-Jun-2020 12:36:14. ### FC Weights loaded. Current time is 28-Jun-2020 12:36:15
Run Prediction for Example Video
Run the demoOnVideo function for the
dlhdl.Workflow class object. This function loads the example video, executes the predict function of the
dlhdl.Workflow object, and then plots the result.
### Finished writing input activations. ### Running single input activations. Deep Learning Processor Profiler Performance Results LastLayerLatency(cycles) LastLayerLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 24904175 0.11320 1 24904217 8.8 conv_module 8967009 0.04076 conv1 1396633 0.00635 norm1 623003 0.00283 pool1 226855 0.00103 conv2 3410044 0.01550 norm2 378531 0.00172 pool2 233635 0.00106 conv3 1139419 0.00518 conv4 892918 0.00406 conv5 615897 0.00280 pool5 50189 0.00023 fc_module 15937166 0.07244 fc6 15819257 0.07191 fcLane1 117125 0.00053 fcLane2 782 0.00000 * The clock frequency of the DL processor is: 220MHz