trainYOLOv2ObjectDetector

Train YOLO v2 object detector

Syntax

detector = trainYOLOv2ObjectDetector(trainingData,lgraph,options)
detector = trainYOLOv2ObjectDetector(trainingData,checkpoint,options)
detector = trainYOLOv2ObjectDetector(trainingData,detector,options)
detector = trainYOLOv2ObjectDetector(___,'MultiScaleTrainingSizes',trainingSizes)
[detector,info] = trainYOLOv2ObjectDetector(___)

Description

example

detector = trainYOLOv2ObjectDetector(trainingData,lgraph,options) returns an object detector trained using you look only once version 2 (YOLO v2) network architecture specified by the input lgraph. The options input specifies training parameters for the detection network.

example

detector = trainYOLOv2ObjectDetector(trainingData,checkpoint,options) resumes training from the saved detector checkpoint.

You can use this syntax to:

  • Add more training data and continue the training.

  • Improve training accuracy by increasing the maximum number of iterations.

detector = trainYOLOv2ObjectDetector(trainingData,detector,options) continues training a YOLO v2 object detector. Use this syntax for fine-tuning a detector.

detector = trainYOLOv2ObjectDetector(___,'MultiScaleTrainingSizes',trainingSizes) specifies the image sizes for multiscale training by using a name-value pair in addition to the input arguments in any of the preceding syntaxes.

example

[detector,info] = trainYOLOv2ObjectDetector(___) also returns information on the training progress, such as the training accuracy and learning rate for each iteration.

Examples

collapse all

Load the training data for vehicle detection into the workspace.

data = load('vehicleTrainingData.mat');
trainingData = data.vehicleTrainingData;

Specify the directory in which training samples are stored. Add full path to the file names in training data.

dataDir = fullfile(toolboxdir('vision'),'visiondata');
trainingData.imageFilename = fullfile(dataDir,trainingData.imageFilename);

Load data file containing the YOLO v2 network into the workspace. Set up the network as a LayerGraph object.

net = load('yolov2VehicleDetector.mat');
lgraph = net.lgraph
lgraph = 
  LayerGraph with properties:

         Layers: [25×1 nnet.cnn.layer.Layer]
    Connections: [24×2 table]

Inspect the layers in the YOLO v2 network and their properties. You can also create the YOLO v2 network by following the steps given in Create YOLO v2 Object Detection Network.

lgraph.Layers
ans = 
  25x1 Layer array with layers:

     1   'input'               Image Input               128x128x3 images
     2   'conv_1'              Convolution               16 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
     3   'BN1'                 Batch Normalization       Batch normalization
     4   'relu_1'              ReLU                      ReLU
     5   'maxpool1'            Max Pooling               2x2 max pooling with stride [2  2] and padding [0  0  0  0]
     6   'conv_2'              Convolution               32 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
     7   'BN2'                 Batch Normalization       Batch normalization
     8   'relu_2'              ReLU                      ReLU
     9   'maxpool2'            Max Pooling               2x2 max pooling with stride [2  2] and padding [0  0  0  0]
    10   'conv_3'              Convolution               64 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
    11   'BN3'                 Batch Normalization       Batch normalization
    12   'relu_3'              ReLU                      ReLU
    13   'maxpool3'            Max Pooling               2x2 max pooling with stride [2  2] and padding [0  0  0  0]
    14   'conv_4'              Convolution               128 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
    15   'BN4'                 Batch Normalization       Batch normalization
    16   'relu_4'              ReLU                      ReLU
    17   'yolov2Conv1'         Convolution               128 3x3 convolutions with stride [1  1] and padding 'same'
    18   'yolov2Batch1'        Batch Normalization       Batch normalization
    19   'yolov2Relu1'         ReLU                      ReLU
    20   'yolov2Conv2'         Convolution               128 3x3 convolutions with stride [1  1] and padding 'same'
    21   'yolov2Batch2'        Batch Normalization       Batch normalization
    22   'yolov2Relu2'         ReLU                      ReLU
    23   'yolov2ClassConv'     Convolution               24 1x1 convolutions with stride [1  1] and padding [0  0  0  0]
    24   'yolov2Transform'     YOLO v2 Transform Layer   YOLO v2 Transform Layer with 4 anchors
    25   'yolov2OutputLayer'   YOLO v2 Output            YOLO v2 Output with 4 anchors

Configure the network training options.

  • Set the solver to use stochastic gradient descent with momentum (sgdm) optimizer for training.

  • Set the initial learning rate to use for training.

  • Set the verbose indicator to display training progress information in the command window.

  • Set the size of mini-batch to use for each training iteration. Reduce the size of mini-batch to reduce memory usage during training.

  • Set the maximum number of epoch for training.

  • Specify the network to shuffle the training data before each epoch.

  • Specify the frequency of verbose printing.

  • Specify the path for saving the checkpoint networks. You can use this option to resume training from any saved checkpoint networks.

options = trainingOptions('sgdm',...
          'InitialLearnRate',0.001,...
          'Verbose',true,...
          'MiniBatchSize',16,...
          'MaxEpochs',30,...
          'Shuffle','every-epoch',...
          'VerboseFrequency',30,...
          'CheckpointPath',tempdir);

Train the YOLO v2 network.

[detector,info] = trainYOLOv2ObjectDetector(trainingData,lgraph,options);
Training on single CPU.
|========================================================================================|
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Mini-batch  |  Base Learning  |
|         |             |   (hh:mm:ss)   |     RMSE     |     Loss     |      Rate       |
|========================================================================================|
|       1 |           1 |       00:00:00 |         7.64 |         58.3 |          0.0010 |
|       2 |          30 |       00:00:22 |         1.57 |          2.5 |          0.0010 |
|       4 |          60 |       00:00:45 |         1.40 |          1.9 |          0.0010 |
|       5 |          90 |       00:01:08 |         1.24 |          1.5 |          0.0010 |
|       7 |         120 |       00:01:30 |         0.94 |          0.9 |          0.0010 |
|       9 |         150 |       00:01:52 |         1.19 |          1.4 |          0.0010 |
|      10 |         180 |       00:02:14 |         0.93 |          0.9 |          0.0010 |
|      12 |         210 |       00:02:38 |         0.73 |          0.5 |          0.0010 |
|      14 |         240 |       00:03:01 |         0.73 |          0.5 |          0.0010 |
|      15 |         270 |       00:03:23 |         0.77 |          0.6 |          0.0010 |
|      17 |         300 |       00:03:46 |         0.62 |          0.4 |          0.0010 |
|      19 |         330 |       00:04:09 |         0.62 |          0.4 |          0.0010 |
|      20 |         360 |       00:04:32 |         0.61 |          0.4 |          0.0010 |
|      22 |         390 |       00:04:55 |         0.63 |          0.4 |          0.0010 |
|      24 |         420 |       00:05:18 |         0.60 |          0.4 |          0.0010 |
|      25 |         450 |       00:05:42 |         0.79 |          0.6 |          0.0010 |
|      27 |         480 |       00:06:05 |         0.56 |          0.3 |          0.0010 |
|      29 |         510 |       00:06:29 |         0.51 |          0.3 |          0.0010 |
|      30 |         540 |       00:06:51 |         0.50 |          0.2 |          0.0010 |
|========================================================================================|

Inspect the properties of the detector.

detector
detector = 
  yolov2ObjectDetector with properties:

            ModelName: 'vehicle'
              Network: [1×1 DAGNetwork]
           ClassNames: {'vehicle'}
          AnchorBoxes: [4×2 double]
    TrainingImageSize: [128 128]

You can verify the training accuracy by inspecting the training loss for each iteration.

figure
plot(info.TrainingLoss)
grid on
xlabel('Number of Iterations')
ylabel('Training Loss for Each Iteration')

Read a test image into the workspace.

img = imread('detectcars.png');

Run the trained YOLO v2 object detector on the test image for vehicle detection.

[bboxes,scores] = detect(detector,img);

Display the detection results.

if(~isempty(bboxes))
    img = insertObjectAnnotation(img,'rectangle',bboxes,scores);
end
figure
imshow(img)

Input Arguments

collapse all

Labeled ground truth images, specified as a table with two, or more columns. The first column must contain paths and file names for grayscale or truecolor (RGB) images. The remaining columns must contain ground truth data related to different object classes in the input image. Each column represents a single object class, such as a car, dog, flower, or stop sign. For example, this figure shows the trainingData value corresponding to a series of images.

The ground truth must be in the format [x y width height]. The format specifies the upper left corner location and size of the object in the corresponding image. The table variable name defines the object class name. To create the ground truth table, use the Image Labeler or Video Labeler app.

Layer graph, specified as a LayerGraph object. The layer graph contains the architecture of the YOLO v2 network. You can create this network by using the yolov2Layers function. Alternatively, you can create the network layers by using yolov2TransformLayer, yolov2ReorgLayer, and yolov2OutputLayer functions. For more details on creating a custom YOLO v2 network, see Design a YOLO v2 Detection Network.

Training options, specified as a TrainingOptionsSGDM, TrainingOptionsRMSProp, or TrainingOptionsADAM object returned by the trainingOptions function. To specify the solver name and other options for network training, use the trainingOptions function.

Note

The trainYOLOv2ObjectDetector function does not support these training options:

  • The 'training-progress' value of the Plots training option

  • The ValidationData, ValidationFrequency, or ValidationPatience training options

  • The OutputFcn option.

Saved detector checkpoint, specified as a yolov2ObjectDetector object. To save the detector after every epoch, set the 'CheckpointPath' name-value argument when using the trainingOptions function. Saving a checkpoint after every epoch is recommended because network training can take a few hours.

To load a checkpoint for a previously trained detector, load the MAT-file from the checkpoint path. For example, if the CheckpointPath property of the object specified by options is '/checkpath', you can load a checkpoint MAT-file by using this code.

data = load('/checkpath/yolov2_checkpoint__216__2018_11_16__13_34_30.mat');
checkpoint = data.detector;

The name of the MAT-file includes the iteration number and timestamp of when the detector checkpoint was saved. The detector is saved in the detector variable of the file. Pass this file back into the trainYOLOv2ObjectDetector function:

yoloDetector = trainYOLOv2ObjectDetector(trainingData,checkpoint,options);

Previously trained YOLO v2 object detector, specified as a yolov2ObjectDetector object. Use this syntax to continue training a detector with additional training data or to perform more training iterations to improve detector accuracy.

Set of image sizes for multiscale training, specified as an M-by-2 matrix, where each row is of the form [height width]. For each training epoch, the input training images are randomly resized to one of the M image sizes specified in this set.

If you do not specify the trainingSizes, the function sets this value to the size in the image input layer of the YOLO v2 network. The network resizes all training images to this value.

Note

The input trainingSizes values specified for multiscale training must be greater than or equal to the input size in the image input layer of the lgraph input argument.

Output Arguments

collapse all

Trained YOLO v2 object detector, returned as yolov2ObjectDetector object. You can train a YOLO v2 object detector to detect multiple object classes.

Training information, returned as a structure array with four elements. Each element corresponds to a stage of training and contains these fields:

  • TrainingLoss — Training loss at each iteration is the mean squared error (MSE) calculated as the sum of localization error, confidence loss, and classification loss. For more information about the training loss function, see Training Loss.

  • TrainingRMSE — Training root mean squared error (RMSE) is the RMSE calculated from the training loss at each iteration.

  • BaseLearnRate — Learning rate at each iteration.

Each field is a numeric vector with one element per training iteration. Values that have not been calculated at a specific iteration are assigned as NaN.

More About

collapse all

Training Loss

During training, the YOLO v2 object detection network optimizes the MSE loss between the predicted bounding boxes and the ground truth. The loss function is defined as

where:

  • S is the number of grid cells.

  • B is the number of bounding boxes in each grid cell.

  • is 1 if the jth bounding box in grid cell i is responsible for detecting the object. Otherwise it is set to 0. A grid cell i is responsible for detecting the object, if the overlap between the ground truth and a bounding box in that grid cell is greater than or equal to 0.6.

  • is 1 if the jth bounding box in grid cell i does not contain any object. Otherwise it is set to 0.

  • is 1 if an object is detected in grid cell i. Otherwise it is set to 0.

  • K1, K2, K3, and K4 are the weights. To adjust the weights, modify the LossFactors property of the output layer by using the yolov2OutputLayer function.

The loss function can be split into three parts:

  • Localization loss

    The first and second terms in the loss function comprise the localization loss. It measures error between the predicted bounding box and the ground truth. The parameters for computing the localization loss include the position, size of the predicted bounding box, and the ground truth. The parameters are defined as follows.

    • , is the center of the jth bounding box relative to grid cell i.

    • , is the center of the ground truth relative to grid cell i.

    • is the width and the height of the jth bounding box in grid cell i, respectively. The size of the predicted bounding box is specified relative to the input image size.

    • is the width and the height of the ground truth in grid cell i, respectively.

    • K1 is the weight for localization loss. Increase this value to increase the weightage for bounding box prediction errors.

  • Confidence loss

    The third and fourth terms in the loss function comprise the confidence loss. The third term measures the objectness (confidence score) error when an object is detected in the jth bounding box of grid cell i. The fourth term measures the objectness error when no object is detected in the jth bounding box of grid cell i. The parameters for computing the confidence loss are defined as follows.

    • Ci is the confidence score of the jth bounding box in grid cell i.

    • Ĉi is the confidence score of the ground truth in grid cell i.

    • K2 is the weight for objectness error, when an object is detected in the predicted bounding box. Increase this value to increase the weightage for bounding box and grid cell that contain the object.

    • K3is the weight for objectness error, when an object is not detected in the predicted bounding box. Decrease this value to decrease the weightage for bounding box and grid cell that does not contain any object. Decreasing the weight for objectness error, prevents the network from training to detect the background instead of the objects.

  • Classification loss

    The fifth term in the loss function comprises the classification loss. For example, suppose that an object is detected in the predicted bounding box contained in grid cell i. Then, the classification loss measures the squared error between the class conditional probabilities for each class in grid cell i. The parameters for computing the classification loss are defined as follows.

    • pi (c) is the estimated conditional class probability for object class c in grid cell i.

    • is the actual conditional class probability for object class c in grid cell i.

    • K4 is the weight for classification error when an object is detected in the grid cell. Increase this value to increase the weightage for classification loss.

References

[1] Joseph. R, S. K. Divvala, R. B. Girshick, and F. Ali. "You Only Look Once: Unified, Real-Time Object Detection." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788. Las Vegas, NV: CVPR, 2016.

[2] Joseph. R and F. Ali. "YOLO 9000: Better, Faster, Stronger." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525. Honolulu, HI: CVPR, 2017.

Introduced in R2019a