Code Generation for Object Detection Using YOLO v2

This example shows how to generate CUDA® code for the Object Detection Using YOLO v2 Deep Learning example from the Computer Vision Toolbox™.


  • CUDA enabled NVIDIA® GPU with compute capability 3.2 or higher.

  • NVIDIA CUDA toolkit and driver.

  • NVIDIA cuDNN library v7 or higher.

  • OpenCV 3.1.0 libraries for video read and image display operations.

  • Environment variables for the compilers and libraries. For information on the supported versions of the compilers and libraries, see Third-party Products (GPU Coder). For setting up the environment variables, see Setting Up the Prerequisite Products (GPU Coder).

  • Deep Learning Toolbox™ for using SeriesNetwork objects.

  • GPU Coder™ for generating CUDA code.

  • GPU Coder Interface for Deep Learning Libraries support package. To install this support package, use the Add-On Explorer.

Verify the GPU Environment

Use the coder.checkGpuInstall function and verify that the compilers and libraries needed for running this example are set up correctly.

envCfg = coder.gpuEnvConfig('host');
envCfg.DeepLibTarget = 'cudnn';
envCfg.DeepCodegen = 1;
envCfg.Quiet = 1;

Get the Pretrained DAGNetwork

net = getYOLOv2();

The DAG network contains 150 layers including convolution, ReLU, and batch normalization layers along with the YOLO v2 transform and YOLO v2 output layers. Use the command net.Layers to see all the layers of the network.


About the 'yolov2_detect' Function

The yolov2_detect.m function takes an image input and run the detector on the image using the deep learning network saved in yolov2ResNet50VehicleExample.mat file. The function loads the network object from yolov2ResNet50VehicleExample.mat into a persistent variable mynet. On subsequent calls to the function, the persistent object is reused for detection.

function outImg = yolov2_detect(in)

%   Copyright 2018-2019 The MathWorks, Inc.

% A persistent object yolov2Obj is used to load the YOLOv2ObjectDetector object.
% At the first call to this function, the persistent object is constructed and
% setup. When the function is called subsequent times, the same object is reused 
% to call detection on inputs, thus avoiding reconstructing and reloading the
% network object.
persistent yolov2Obj;

if isempty(yolov2Obj)
    yolov2Obj = coder.loadDeepLearningNetwork('yolov2ResNet50VehicleExample.mat');

% pass in input
[bboxes,~,labels] = yolov2Obj.detect(in,'Threshold',0.5);

% Annotate detections in the image.
outImg = insertObjectAnnotation(in,'rectangle',bboxes,labels);

Run MEX Code Generation for 'yolov2_detect' Function

To generate CUDA code from the design file yolov2_detect.m, create a GPU code configuration object for a MEX target and set the target language to C++. Use the coder.DeepLearningConfig function to create a CuDNN deep learning configuration object and assign it to the DeepLearningConfig property of the GPU code configuration object. Run the codegen command specifying an input of size [224,224,3]. This value corresponds to the input layer size of YOLOv2.

cfg = coder.gpuConfig('mex');
cfg.TargetLang = 'C++';
cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');
codegen -config cfg yolov2_detect -args {ones(224,224,3,'uint8')} -report
Code generation successful: To view the report, open('codegen/mex/yolov2_detect/html/report.mldatx').

Run the Generated MEX

Set up the video file reader and read the input video. Create a video player to display the video and the output detections.

videoFile = 'highway_lanechange.mp4';
videoFreader = vision.VideoFileReader(videoFile,'VideoOutputDataType','uint8');
depVideoPlayer = vision.DeployableVideoPlayer('Size','Custom','CustomSize',[640 480]);

Read the video input frame-by-frame and detect the vehicles in the video using the detector.

cont = ~isDone(videoFreader);
while cont
    I = step(videoFreader);
    in = imresize(I,[224,224]);
    out = yolov2_detect_mex(in);
    step(depVideoPlayer, out);
    cont = ~isDone(videoFreader) && isOpen(depVideoPlayer); % Exit the loop if the video player figure window is closed


[1] Redmon, Joseph, and Ali Farhadi. "YOLO9000: Better, Faster, Stronger." 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.