semanticseg

Semantic image segmentation using deep learning

Syntax

C = semanticseg(I,network)
[C,score,allScores] = semanticseg(I,network)
[___] = semanticseg(I,network,roi)
pxds = semanticseg(imds,network)
[___] = semanticseg(___,Name,Value)

Description

example

C = semanticseg(I,network) returns a semantic segmentation of the input image using deep learning. The input network must be either a SeriesNetwork or DAGNetwork object.

[C,score,allScores] = semanticseg(I,network) returns a semantic segmentation of the input image with the classification scores for each categorical label in C. The scores are returned in a categorical array that corresponds to each pixel or voxel in the input image. allScores contains the scores for all label categories that the input network can classify.

[___] = semanticseg(I,network,roi) returns a semantic segmentation for a rectangular subregion of the input image.

pxds = semanticseg(imds,network) returns the semantic segmentation for a collection of images in imds, an ImageDatastore object.

This function supports parallel computing using multiple MATLAB® workers when processing an ImageDatastore object. You can enable parallel computing using the Computer Vision Toolbox Preferences dialog.

[___] = semanticseg(___,Name,Value) returns semantic segmentation with additional options specified by one or more Name,Value pair arguments.

Examples

collapse all

Overlay segmentation results on image and display the results.

Load a pretrained network.

data = load('triangleSegmentationNetwork');
net = data.net
net = 
  SeriesNetwork with properties:

    Layers: [10x1 nnet.cnn.layer.Layer]

List the network layers.

net.Layers
ans = 
  10x1 Layer array with layers:

     1   'imageinput'        Image Input                  32x32x1 images with 'zerocenter' normalization
     2   'conv_1'            Convolution                  64 3x3x1 convolutions with stride [1  1] and padding [1  1  1  1]
     3   'relu_1'            ReLU                         ReLU
     4   'maxpool'           Max Pooling                  2x2 max pooling with stride [2  2] and padding [0  0  0  0]
     5   'conv_2'            Convolution                  64 3x3x64 convolutions with stride [1  1] and padding [1  1  1  1]
     6   'relu_2'            ReLU                         ReLU
     7   'transposed-conv'   Transposed Convolution       64 4x4x64 transposed convolutions with stride [2  2] and cropping [1  1  1  1]
     8   'conv_3'            Convolution                  2 1x1x64 convolutions with stride [1  1] and padding [0  0  0  0]
     9   'softmax'           Softmax                      softmax
    10   'classoutput'       Pixel Classification Layer   Class weighted cross-entropy loss with classes 'triangle' and 'background'

Read and display the test image.

I = imread('triangleTest.jpg');
figure
imshow(I)

Perform semantic image segmentation.

[C,scores] = semanticseg(I,net);

Overlay segmentation results on the image and display the results.

B = labeloverlay(I, C);
figure
imshow(B)

Display the classification scores.

figure
imagesc(scores)
axis square
colorbar

Create a binary mask with only the triangles.

BW = C == 'triangle';
figure
imshow(BW)

Load a pretrained network.

data = load('triangleSegmentationNetwork');
net = data.net;

Load test images using imageDatastore.

dataDir = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
testImageDir = fullfile(dataDir,'testImages');
imds = imageDatastore(testImageDir)
imds = 
  ImageDatastore with properties:

                       Files: {
                              ' .../toolbox/vision/visiondata/triangleImages/testImages/image_001.jpg';
                              ' .../toolbox/vision/visiondata/triangleImages/testImages/image_002.jpg';
                              ' .../toolbox/vision/visiondata/triangleImages/testImages/image_003.jpg'
                               ... and 97 more
                              }
    AlternateFileSystemRoots: {}
                    ReadSize: 1
                      Labels: {}
                     ReadFcn: @readDatastoreImage

Load ground truth test labels.

testLabelDir = fullfile(dataDir,'testLabels');
classNames = ["triangle" "background"];
pixelLabelID = [255 0];
pxdsTruth = pixelLabelDatastore(testLabelDir,classNames,pixelLabelID);

Run semantic segmentation on all of the test images.

pxdsResults = semanticseg(imds,net,'WriteLocation',tempdir);
Running semantic segmentation network
-------------------------------------
* Processing 100 images.
* Progress: 100.00%

Compare results against ground truth.

metrics = evaluateSemanticSegmentation(pxdsResults,pxdsTruth)
Evaluating semantic segmentation results
---------------------------------------[==================================================] 100%
Elapsed time: 00:00:01
Estimated time remaining: 00:00:00
* Finalizing... Done.
* Data set metrics:

    GlobalAccuracy    MeanAccuracy    MeanIoU    WeightedIoU    MeanBFScore
    ______________    ____________    _______    ___________    ___________

       0.90624          0.95085       0.61588      0.87529        0.40652  
metrics = 
  semanticSegmentationMetrics with properties:

              ConfusionMatrix: [2x2 table]
    NormalizedConfusionMatrix: [2x2 table]
               DataSetMetrics: [1x5 table]
                 ClassMetrics: [2x3 table]
                 ImageMetrics: [100x5 table]

This example shows how to define and create a custom pixel classification layer that uses Dice loss.

This layer can be used to train semantic segmentation networks. To learn more about creating custom deep learning layers, see Define Custom Deep Learning Layers (Deep Learning Toolbox).

Dice Loss

The Dice loss is based on the Sørensen-Dice similarity coefficient for measuring overlap between two segmented images. The generalized Dice loss [1,2], L, for between one image Y and the corresponding ground truth T is given by

L=1-2k=1Kwkm=1MYkmTkmk=1Kwkm=1MYkm2+Tkm2 ,

where K is the number of classes, M is the number of elements along the first two dimensions of Y, andwk is a class specific weighting factor that controls the contribution each class makes to the loss. wk is typically the inverse area of the expected region:

wk=1(m=1MTkm)2

This weighting helps counter the influence of larger regions on the Dice score making it easier for the network to learn how to segment smaller regions.

Classification Layer Template

Copy the classification layer template into a new file in MATLAB®. This template outlines the structure of a classification layer and includes the functions that define the layer behavior. The rest of the example shows how to complete the dicePixelClassificationLayer.

classdef dicePixelClassificationLayer < nnet.layer.ClassificationLayer

   properties
      % Optional properties
   end

   methods

        function loss = forwardLoss(layer, Y, T)
            % Layer forward loss function goes here.
        end
        
        function dLdY = backwardLoss(layer, Y, T)
            % Layer backward loss function goes here.
        end
    end
end

Declare Layer Properties

By default, custom output layers have the following properties:

  • Name – Layer name, specified as a character vector or a string scalar. To include this layer in a layer graph, you must specify a nonempty unique layer name. If you train a series network with this layer and Name is set to '', then the software automatically assigns a name at training time.

  • Description – One-line description of the layer, specified as a character vector or a string scalar. This description appears when the layer is displayed in a Layer array. If you do not specify a layer description, then the software displays the layer class name.

  • Type – Type of the layer, specified as a character vector or a string scalar. The value of Type appears when the layer is displayed in a Layer array. If you do not specify a layer type, then the software displays 'Classification layer' or 'Regression layer'.

Custom classification layers also have the following property:

  • Classes – Classes of the output layer, specified as a categorical vector, string array, cell array of character vectors, or 'auto'. If Classes is 'auto', then the software automatically sets the classes at training time. If you specify a string array or cell array of character vectors str, then the software sets the classes of the output layer to categorical(str,str). The default value is 'auto'.

If the layer has no other properties, then you can omit the properties section.

The Dice loss requires a small constant value to prevent division by zero. Specify the property, Epsilon, to hold this value.

classdef dicePixelClassificationLayer < nnet.layer.ClassificationLayer

    properties(Constant)
       % Small constant to prevent division by zero. 
       Epsilon = 1e-8;

    end

    ...
end

Create Constructor Function

Create the function that constructs the layer and initializes the layer properties. Specify any variables required to create the layer as inputs to the constructor function.

Specify an optional input argument name to assign to the Name property at creation.

        function layer = dicePixelClassificationLayer(name)
            % layer =  dicePixelClassificationLayer(name) creates a Dice
            % pixel classification layer with the specified name.
            
            % Set layer name.          
            layer.Name = name;
            
            % Set layer description.
            layer.Description = 'Dice loss';
        end

Create Forward Loss Function

Create a function named forwardLoss that returns the weighted cross entropy loss between the predictions made by the network and the training targets. The syntax for forwardLoss is loss = forwardLoss(layer, Y, T), where Y is the output of the previous layer and T represents the training targets.

For semantic segmentation problems, the dimensions of T match the dimension of Y, where Y is a 4-D array of size H-by-W-by-K-by-N, where K is the number of classes, and N is the mini-batch size.

The size of Y depends on the output of the previous layer. To ensure that Y is the same size as T, you must include a layer that outputs the correct size before the output layer. For example, to ensure that Y is a 4-D array of prediction scores for K classes, you can include a fully connected layer of size K or a convolutional layer with K filters followed by a softmax layer before the output layer.

        function loss = forwardLoss(layer, Y, T)
            % loss = forwardLoss(layer, Y, T) returns the Dice loss between
            % the predictions Y and the training targets T.   

            % Weights by inverse of region size.
            W = 1 ./ sum(sum(T,1),2).^2;
            
            intersection = sum(sum(Y.*T,1),2);
            union = sum(sum(Y.^2 + T.^2, 1),2);          
            
            numer = 2*sum(W.*intersection,3) + layer.Epsilon;
            denom = sum(W.*union,3) + layer.Epsilon;
            
            % Compute Dice score.
            dice = numer./denom;
            
            % Return average Dice loss.
            N = size(Y,4);
            loss = sum((1-dice))/N;
            
        end

Create Backward Loss Function

Create the backward loss function that returns the derivatives of the Dice loss with respect to the predictions Y. The syntax for backwardLoss is loss = backwardLoss(layer, Y, T), where Y is the output of the previous layer and T represents the training targets.

The dimensions of Y and T are the same as the inputs in forwardLoss.

        function dLdY = backwardLoss(layer, Y, T)
            % dLdY = backwardLoss(layer, Y, T) returns the derivatives of
            % the Dice loss with respect to the predictions Y.
            
            % Weights by inverse of region size.
            W = 1 ./ sum(sum(T,1),2).^2;
            
            intersection = sum(sum(Y.*T,1),2);
            union = sum(sum(Y.^2 + T.^2, 1),2);
     
            numer = 2*sum(W.*intersection,3) + layer.Epsilon;
            denom = sum(W.*union,3) + layer.Epsilon;
            
            N = size(Y,4);
      
            dLdY = (2*W.*Y.*numer./denom.^2 - 2*W.*T./denom)./N;
        end

Completed Layer

The completed layer is provided in dicePixelClassificationLayer.m.

classdef dicePixelClassificationLayer < nnet.layer.ClassificationLayer
    % This layer implements the generalized dice loss function for training
    % semantic segmentation networks.
    
    properties(Constant)
        % Small constant to prevent division by zero. 
        Epsilon = 1e-8;
    end
    
    methods
        
        function layer = dicePixelClassificationLayer(name)
            % layer =  dicePixelClassificationLayer(name) creates a Dice
            % pixel classification layer with the specified name.
            
            % Set layer name.          
            layer.Name = name;
            
            % Set layer description.
            layer.Description = 'Dice loss';
        end
        
        
        function loss = forwardLoss(layer, Y, T)
            % loss = forwardLoss(layer, Y, T) returns the Dice loss between
            % the predictions Y and the training targets T.   

            % Weights by inverse of region size.
            W = 1 ./ sum(sum(T,1),2).^2;
            
            intersection = sum(sum(Y.*T,1),2);
            union = sum(sum(Y.^2 + T.^2, 1),2);          
            
            numer = 2*sum(W.*intersection,3) + layer.Epsilon;
            denom = sum(W.*union,3) + layer.Epsilon;
            
            % Compute Dice score.
            dice = numer./denom;
            
            % Return average Dice loss.
            N = size(Y,4);
            loss = sum((1-dice))/N;
            
        end
        
        function dLdY = backwardLoss(layer, Y, T)
            % dLdY = backwardLoss(layer, Y, T) returns the derivatives of
            % the Dice loss with respect to the predictions Y.
            
            % Weights by inverse of region size.
            W = 1 ./ sum(sum(T,1),2).^2;
            
            intersection = sum(sum(Y.*T,1),2);
            union = sum(sum(Y.^2 + T.^2, 1),2);
     
            numer = 2*sum(W.*intersection,3) + layer.Epsilon;
            denom = sum(W.*union,3) + layer.Epsilon;
            
            N = size(Y,4);
      
            dLdY = (2*W.*Y.*numer./denom.^2 - 2*W.*T./denom)./N;
        end
    end
end

GPU Compatibility

For GPU compatibility, the layer functions must support inputs and return outputs of type gpuArray. Any other functions used by the layer must do the same.

The MATLAB functions used in forwardLoss, and backwardLoss in dicePixelClassificationLayer all support gpuArray inputs, so the layer is GPU compatible.

Check Output Layer Validity

Create an instance of the layer.

layer = dicePixelClassificationLayer('dice');

Check the layer validity of the layer using checkLayer. Specify the valid input size to be the size of a single observation of typical input to the layer. The layer expects a H-by-W-by-K-by-N array inputs, where K is the number of classes, and N is the number of observations in the mini-batch.

numClasses = 2;
validInputSize = [4 4 numClasses];
checkLayer(layer,validInputSize, 'ObservationDimension',4)
Running nnet.checklayer.OutputLayerTestCase
.......... .......
Done nnet.checklayer.OutputLayerTestCase
__________

Test Summary:
	 17 Passed, 0 Failed, 0 Incomplete, 0 Skipped.
	 Time elapsed: 1.6227 seconds.

The test summary reports the number of passed, failed, incomplete, and skipped tests.

Use Custom Layer in Semantic Segmentation Network

Create a semantic segmentation network that uses the dicePixelClassificationLayer.

layers = [
    imageInputLayer([32 32 1])
    convolution2dLayer(3,64,'Padding',1)
    reluLayer
    maxPooling2dLayer(2,'Stride',2)
    convolution2dLayer(3,64,'Padding',1)
    reluLayer
    transposedConv2dLayer(4,64,'Stride',2,'Cropping',1)
    convolution2dLayer(1,2)
    softmaxLayer
    dicePixelClassificationLayer('dice')]
layers = 
  10x1 Layer array with layers:

     1   ''       Image Input              32x32x1 images with 'zerocenter' normalization
     2   ''       Convolution              64 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
     3   ''       ReLU                     ReLU
     4   ''       Max Pooling              2x2 max pooling with stride [2  2] and padding [0  0  0  0]
     5   ''       Convolution              64 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
     6   ''       ReLU                     ReLU
     7   ''       Transposed Convolution   64 4x4 transposed convolutions with stride [2  2] and output cropping [1  1]
     8   ''       Convolution              2 1x1 convolutions with stride [1  1] and padding [0  0  0  0]
     9   ''       Softmax                  softmax
    10   'dice'   Classification Output    Dice loss

Load training data for semantic segmentation using imageDatastore and pixelLabelDatastore.

dataSetDir = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
imageDir = fullfile(dataSetDir,'trainingImages');
labelDir = fullfile(dataSetDir,'trainingLabels');

imds = imageDatastore(imageDir);

classNames = ["triangle" "background"];
labelIDs = [255 0];
pxds = pixelLabelDatastore(labelDir, classNames, labelIDs);

Associate the image and pixel label data using pixelLabelImageDatastore.

ds = pixelLabelImageDatastore(imds,pxds);

Set the training options and train the network.

options = trainingOptions('sgdm', ...
    'InitialLearnRate',1e-2, ...
    'MaxEpochs',100, ...
    'LearnRateDropFactor',1e-1, ...
    'LearnRateDropPeriod',50, ...
    'LearnRateSchedule','piecewise', ...
    'MiniBatchSize',128);

net = trainNetwork(ds,layers,options);
Training on single GPU.
Initializing image normalization.
|========================================================================================|
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Mini-batch  |  Base Learning  |
|         |             |   (hh:mm:ss)   |   Accuracy   |     Loss     |      Rate       |
|========================================================================================|
|       1 |           1 |       00:00:03 |       27.89% |       0.8346 |          0.0100 |
|      50 |          50 |       00:00:34 |       89.67% |       0.6384 |          0.0100 |
|     100 |         100 |       00:01:09 |       94.35% |       0.5024 |          0.0010 |
|========================================================================================|

Evaluate the trained network by segmenting a test image and displaying the segmentation result.

I = imread('triangleTest.jpg');

[C,scores] = semanticseg(I,net);

B = labeloverlay(I,C);
figure
imshow(imtile({I,B}))

References

  1. Crum, William R., Oscar Camara, and Derek LG Hill. "Generalized overlap measures for evaluation and validation in medical image analysis." IEEE transactions on medical imaging 25.11 (2006): 1451-1461.

  2. Sudre, Carole H., et al. "Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations." Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Springer, Cham, 2017. 240-248.

This example shows how to train a semantic segmentation network using dilated convolutions.

A semantic segmentation network classifies every pixel in an image, resulting in an image that is segmented by class. Applications for semantic segmentation include road segmentation for autonomous driving and cancer cell segmentation for medical diagnosis. To learn more, see Semantic Segmentation Basics.

Semantic segmentation networks like DeepLab [1] make extensive use of dilated convolutions (also known as atrous convolutions) because they can increase the receptive field of the layer (the area of the input which the layers can see) without increasing the number of parameters or computations.

Load Training Data

The example uses a simple dataset of 32x32 triangle images for illustration purposes. The dataset includes accompanying pixel label ground truth data. Load the training data using an imageDatastore and a pixelLabelDatastore.

dataFolder = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
imageFolderTrain = fullfile(dataFolder,'trainingImages');
labelFolderTrain = fullfile(dataFolder,'trainingLabels');

Create an image datastore for the images.

imdsTrain = imageDatastore(imageFolderTrain);

Create a pixelLabelDatastore for the ground truth pixel labels.

classNames = ["triangle" "background"];
labels = [255 0];
pxdsTrain = pixelLabelDatastore(labelFolderTrain,classNames,labels)
pxdsTrain = 
  PixelLabelDatastore with properties:

                       Files: {200×1 cell}
                  ClassNames: {2×1 cell}
                    ReadSize: 1
                     ReadFcn: @readDatastoreImage
    AlternateFileSystemRoots: {}

Create Semantic Segmentation Network

This example uses a simple semantic segmentation network based on dilated convolutions.

Create a data source for training data and get the pixel counts for each label.

pximdsTrain = pixelLabelImageDatastore(imdsTrain,pxdsTrain);
tbl = countEachLabel(pximdsTrain)
tbl=2×3 table
        Name        PixelCount    ImagePixelCount
    ____________    __________    _______________

    'triangle'           10326       2.048e+05   
    'background'    1.9447e+05       2.048e+05   

The majority of pixel labels are for background. This class imbalance biases the learning process in favor of the dominant class. To fix this, use class weighting to balance the classes. There are several methods for computing class weights. One common method is inverse frequency weighting where the class weights are the inverse of the class frequencies. This increases weight given to under-represented classes. Calculate the class weights using inverse frequency weighting.

numberPixels = sum(tbl.PixelCount);
frequency = tbl.PixelCount / numberPixels;
classWeights = 1 ./ frequency;

Create a network for pixel classificaiton with an image input layer with input size corresponding to the size of the input images. Next, specify three blocks of convolution, batch normalization, and ReLU layers. For each convolutional layer, specify 32 3-by-3 filters with increasing dilation factors and specify to pad the inputs to be the same size as the outputs by setting the 'Padding' option to 'same'. To classify the pixels, include a convolutional layer with K 1-by-1 convolutions, where K is the number of classes, followed by a softmax layer and a pixelClassificationLayer with the inverse class weights.

inputSize = [32 32 1];
filterSize = 3;
numFilters = 32;
numClasses = numel(classNames);

layers = [
    imageInputLayer(inputSize)
    
    convolution2dLayer(filterSize,numFilters,'DilationFactor',1,'Padding','same')
    batchNormalizationLayer
    reluLayer
    
    convolution2dLayer(filterSize,numFilters,'DilationFactor',2,'Padding','same')
    batchNormalizationLayer
    reluLayer
    
    convolution2dLayer(filterSize,numFilters,'DilationFactor',4,'Padding','same')
    batchNormalizationLayer
    reluLayer
    
    convolution2dLayer(1,numClasses)
    softmaxLayer
    pixelClassificationLayer('Classes',classNames,'ClassWeights',classWeights)];

Train Network

Specify the training options. Using the SGDM solver, train for 100 epochs, mini-batch size 64, and learn rate 0.001.

options = trainingOptions('sgdm', ...
    'MaxEpochs', 100, ...
    'MiniBatchSize', 64, ... 
    'InitialLearnRate', 1e-3);

Train the network using trainNetwork.

net = trainNetwork(pximdsTrain,layers,options);
Training on single GPU.
Initializing image normalization.
|========================================================================================|
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Mini-batch  |  Base Learning  |
|         |             |   (hh:mm:ss)   |   Accuracy   |     Loss     |      Rate       |
|========================================================================================|
|       1 |           1 |       00:00:00 |       67.54% |       0.7098 |          0.0010 |
|      17 |          50 |       00:00:03 |       84.60% |       0.3851 |          0.0010 |
|      34 |         100 |       00:00:06 |       89.85% |       0.2536 |          0.0010 |
|      50 |         150 |       00:00:09 |       93.39% |       0.1959 |          0.0010 |
|      67 |         200 |       00:00:11 |       95.89% |       0.1559 |          0.0010 |
|      84 |         250 |       00:00:14 |       97.29% |       0.1188 |          0.0010 |
|     100 |         300 |       00:00:18 |       98.28% |       0.0970 |          0.0010 |
|========================================================================================|

Test Network

Load the test data. Create an image datastore for the images. Create a pixelLabelDatastore for the ground truth pixel labels.

imageFolderTest = fullfile(dataFolder,'testImages');
imdsTest = imageDatastore(imageFolderTest);
labelFolderTest = fullfile(dataFolder,'testLabels');
pxdsTest = pixelLabelDatastore(labelFolderTest,classNames,labels);

Make predictions using the test data and trained network.

pxdsPred = semanticseg(imdsTest,net,'WriteLocation',tempdir);
Running semantic segmentation network
-------------------------------------
* Processing 100 images.
* Progress: 100.00%

Evaluate the prediction accuracy using evaluateSemanticSegmentation.

metrics = evaluateSemanticSegmentation(pxdsPred,pxdsTest);
Evaluating semantic segmentation results
----------------------------------------
* Selected metrics: global accuracy, class accuracy, IoU, weighted IoU, BF score.
* Processing 100 images...
[==================================================] 100%
Elapsed time: 00:00:00
Estimated time remaining: 00:00:00
* Finalizing... Done.
* Data set metrics:

    GlobalAccuracy    MeanAccuracy    MeanIoU    WeightedIoU    MeanBFScore
    ______________    ____________    _______    ___________    ___________

       0.98334          0.99107       0.85869      0.97109        0.68197  

For more information on evaluating semantic segmentation networks, see evaluateSemanticSegmentation.

Segment New Image

Read and display the test image triangleTest.jpg.

imgTest = imread('triangleTest.jpg');
figure
imshow(imgTest)

Segment the test image using semanticseg and display the results using labeloverlay.

C = semanticseg(imgTest,net);
B = labeloverlay(imgTest,C);
figure
imshow(B)

References

  1. Chen, Liang-Chieh, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs." IEEE transactions on pattern analysis and machine intelligence 40, no. 4 (2018): 834-848.

Input Arguments

collapse all

Input image, specified as one of the following.

Image TypeData Format
Single 2-D grayscale image2-D matrix of size H-by-W
Single 2-D color image or 2-D multispectral image3-D array of size H-by-W-by-C. The number of color channels C is 3 for color images.
Series of P 2-D images4-D array of size H-by-W-by-C-by-P. The number of color channels C is 1 for grayscale images and 3 for color images.
Single 3-D grayscale image with depth D3-D array of size H-by-W-by-D
Single 3-D color image or 3-D multispectral image4-D array of size H-by-W-by-D-by-C. The number of color channels C is 3 for color images.
Series of P 3-D images5-D array of size H-by-W-by-D-by-C-by-P

The input image can also be a gpuArray containing one of the preceding image types (requires Parallel Computing Toolbox™).

Data Types: uint8 | uint16 | int16 | double | single | logical

Network, specified as either a SeriesNetwork or a DAGNetwork object.

Region of interest, specified as one of the following.

Image TypeROI Format
2-D image4-element vector of the form [x,y,width,height]
3-D image6-element vector of the form [x,y,z,width,height,depth]

The vector defines a rectangular or cuboidal region of interest fully contained in the input image. Image pixels outside the region of interest are assigned the <undefined> categorical label. If the input image consists of a series of images, then semanticseg applies the same roi to all images in the series.

Collection of images, specified as an ImageDatastore object. The function returns the semantic segmentation as a categorical array that relates a label to each pixel or voxel in the input image.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'ExecutionEnvironment','gpu'

Returned segmentation type, specified as either 'categorical', 'double', or 'uint8'. When you select 'double' or 'uint8', the function returns the segmentation results as a label array containing label IDs. The IDs are integer values that correspond to the class names defined in the classification layer used in the input network.

The OutputType property cannot be used with an ImageDatastore object input.

Group of images, specified as an integer. Images are grouped and processed together as a batch. They are used for processing a large collection of images and they improve computational efficiency. Increasing the MiniBatchSize value increases the efficiency, but it also takes up more memory.

Hardware resource used to process images with a network, specified as 'auto', 'gpu', or 'cpu'.

ExecutionEnvironmentDescription
'auto'Use a GPU if available. Otherwise, use the CPU. The use of GPU requires Parallel Computing Toolbox, and a CUDA® enabled NVIDIA® GPU with compute capability 3.0 or higher.
'gpu'Use the GPU. If a suitable GPU is not available, the function returns an error message.
'cpu'Use the CPU.

Folder location, specified as pwd (your current working folder), a string scalar, or a character vector. The specified folder must exist and have write permissions.

This property applies only when using an ImageDatastore object input.

Prefix applied to output file names, specified as a string scalar or character vector. The image files are named as follows:

  • prefix_N.png, where N corresponds to the index of the input image file, imds.Files(N).

This property applies only when using an ImageDatastore object input.

Display progress information, specified as 'true' or 'false'.

This property applies only when using an ImageDatastore object input.

Output Arguments

collapse all

Categorical labels, returned as a categorical array. The elements of the label array correspond to the pixel or voxel elements of the input image. If you selected an ROI, then the labels are limited to the area within the ROI. Image pixels and voxels outside the region of interest are assigned the <undefined> categorical label.

Image TypeCategorical Label Format
Single 2-D image2-D matrix of size H-by-W. Element C(i,j) is the categorical label assigned to the pixel I(i,j).
Series of P 2-D images3-D array of size H-by-W-by-P. Element C(i,j,p) is the categorical label assigned to the pixel I(i,j,p).
Single 3-D image3-D array of size H-by-W-by-D. Element C(i,j,k) is the categorical label assigned to the voxel I(i,j,k).
Series of P 3-D images4-D array of size H-by-W-by-D-by-P. Element C(i,j,k,p) is the categorical label assigned to the voxel I(i,j,k,p).

Classification scores for each categorical label in C, returned as a categorical array. The scores represents the confidence in the predicted labels C.

Image TypeScore Format
Single 2-D image2-D matrix of size H-by-W. Element score(i,j) is the classification score of the pixel I(i,j).
Series of P 2-D images3-D array of size H-by-W-by-P. Element score(i,j,p) is the classification score of the pixel I(i,j,p).
Single 3-D image3-D array of size H-by-W-by-D. Element score(i,j,k) is the classification score of the voxel I(i,j,k).
Series of P 3-D images4-D array of size H-by-W-by-D-by-P. Element score(i,j,k,p) is the classification score of the voxel I(i,j,k,p).

Scores for all label categories that the input network can classify, returned as a numeric array. The format of the array is described in the following table, with L representing the total number of label categories.

Image TypeAll Scores Format
Single 2-D image3-D array of size H-by-W-by-L. Element allScores(i,j,q) is the score of the qth label at the pixel I(i,j).
Series of P 2-D images4-D array of size H-by-W-by-L-by-P. Element allscores(i,j,q,p) is the score of the qth label at the pixel I(i,j,p).
Single 3-D image4-D array of size H-by-W-by-D-by-L. Element allscores(i,j,k,q) is the score of the qth label at the voxel I(i,j,k).
Series of P 3-D images5-D array of size H-by-W-by-D-by-L-by-P. Element allscores(i,j,k,q,p) is the score of the qth label at the voxel I(i,j,k,p).

Semantic segmentation results, returned as a pixelLabelDatastore object. The object contains the semantic segmentation results for all the images contained in the imds input object. The result for each image is saved as separate uint8 label matrices of PNG images. You can use read(pxds) to return the categorical labels assigned to the images in imds.

Extended Capabilities

Introduced in R2017b