SSD multibox object detection network



lgraph = ssdLayers(imageSize,numClasses,baseNetwork) creates a single shot detector (SSD) multibox object detection network based on the baseNetwork, input image size, and the number of classes the network should be configured to classify. The network is returned as an LayerGraph (Deep Learning Toolbox) object.

The SSD is a convolutional neural network-based object detector that predicts bounding box coordinates, classification scores, and corresponding class labels.

lgraph = ssdLayers(___,anchorBoxes,predictorLayerNames) returns an SSD that contains custom anchor boxes specified by anchorBoxes that are connected to the network layers at locations specified by predictorLayerNames. Specify these arguments in addition to the input argument from the previous syntax.


collapse all

Specify the base network.

baseNetwork = 'vgg16';

Specify the image size.

imageSize = [300 300 3];

Specify the classes to detect.

numClasses = 2;

Create the SSD object detection network.

lgraph = ssdLayers(imageSize,numClasses,baseNetwork);

Visualize the network using the network analyzer.


Input Arguments

collapse all

Size of input image, specified as one of these values.

  • Two-element vector of the form [H W] for a grayscale image of size H-by-W

  • Three-element vector of the form [H W 3] for an RGB color image of size H-by-W

When you set the baseNetwork input to 'vgg16', 'resnet50', or 'resnet101', the imageSize input must be of the form [H W 3].

Number of classes for the network to classify, specified as a positive scalar.

Pretrained convolutional neural network, specified as a LayerGraph (Deep Learning Toolbox), DAGNetwork (Deep Learning Toolbox), or SeriesNetwork (Deep Learning Toolbox) object or as one of these network names. To specify one of these names, you must download and install the network support packages for the corresponding valid network names.

The pretrained convolutional neural network is used as the base for the SSD multibox object detection network. For details on pretrained networks in MATLAB®, see Pretrained Deep Neural Networks (Deep Learning Toolbox).

Anchor boxes, specified as a 1-by-M cell array for M number of predictor layers in the SSD network. Each predictor layer contains a K-by-2 matrix that defines K anchor boxes of the form [height width]. The number of anchor boxes in each element can vary.

The size of each anchor box is determined based on the scale and aspect ratio of different object classes present in input training data. The size of each anchor box must be smaller than or equal to the size of the input image. You can use the clustering approach for estimating anchor boxes from the training data. For more information, see Estimate Anchor Boxes From Training Data.

Names of layers in input, specified as an M-element vector of strings or a 1-by-M cell array of character vectors. The SSD detection subnetworks are attached to the predictor layers specified by this input.

Output Arguments

collapse all

SSD multibox object detection network, returned as a LayerGraph (Deep Learning Toolbox) object.


The default value for the Normalization property of the image input layer in the returned lgraph object is set to the Normalization property of the base network specified in baseNetwork.


The ssdLayers function creates an SSD network and returns lgraph, an object that represents the network architecture for an SSD object detector.

The trainSSDObjectDetector function trains and returns an SSD object detector, ssdObjectDetector. Use the detect object function for the ssdObjectDetector object to detect objects using the detector trained with the SSD network architecture.

bbox = detect(detector,I)

The ssdLayers function uses a pretrained neural network as the base network, to which it adds a detection subnetwork required for creating an SSD object detection network. Given a base network, ssdLayers removes all the layers succeeding the feature layer in the base network and adds the detection subnetwork. The detection subnetwork is comprised of groups of serially connected convolution, rectified linear unit (ReLU), and batch normalization layers. The SSD merge layer, a box regression layer, and a focal loss classification layer are added to the detection subnetwork.


[1] Liu, Wei, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. "SSD: Single Shot MultiBox Detector." In Computer Vision – ECCV 2016, edited by Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, 9905:21-37. Cham: Springer International Publishing, 2016.

[2] Huang, Jonathan, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, et al. "Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors." In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3296-97. Honolulu, HI:IEEE, 2017. https//

Introduced in R2020a