# Generate Adversarial Examples for Semantic Segmentation

This example shows how to generate adversarial examples for a semantic segmentation network using the basic iterative method (BIM).

Semantic segmentation is the process of assigning each pixel in an image a class label, for example, car, bike, person, or sky. Applications for semantic segmentation include road segmentation for autonomous driving and cancer cell segmentation for medical diagnosis.

Neural networks can be susceptible to a phenomenon known as adversarial examples [1], where very small changes to an input can cause it to be misclassified. These changes are often imperceptible to humans. This example shows how to generate an adversarial example for a semantic segmentation network.

This example generates adversarial examples using the CamVid [2] data set from the University of Cambridge. The CamVid data set is a collection of images containing street-level views obtained while driving. The data set provides pixel-level labels for 32 semantic classes including car, pedestrian, and road.

Load a pretrained semantic segmentation network. This example loads a Deeplab v3+ network trained on the CamVid data set with weights initialized from a pretrained ResNet-18 network. For more information on building and training a Deeplab v3+ semantic segmentation network, see Semantic Segmentation Using Deep Learning (Deep Learning Toolbox).

pretrainedURL = "https://www.mathworks.com/supportfiles/vision/data/deeplabv3plusResnet18CamVid.mat";
pretrainedFolder = fullfile(tempdir,"pretrainedNetwork");
pretrainedNetwork = fullfile(pretrainedFolder,"deeplabv3plusResnet18CamVid.mat");
if ~exist(pretrainedNetwork,"file")
mkdir(pretrainedFolder);
websave(pretrainedNetwork,pretrainedURL);
end

Load the network and convert it to a dlnetwork.

net = data.net;

lgraph = layerGraph(net);
lgraph = removeLayers(lgraph,lgraph.Layers(end).Name);
net = dlnetwork(lgraph);

Load an image and its corresponding label image. The image is a street-level view obtained from a car being driven. The label image contains the ground truth pixel labels. In this example, you create an adversarial example that causes the semantic segmentation network to misclassify the pixels in the Bicyclist class.

Use the supporting function convertCamVidLabelImage, defined at the end of this example, to convert the label image to a categorical array.

The data set contains 32 classes. Use the supporting function camVidClassNames11, defined at the end of this example, to reduce the number of classes to 11 by grouping multiple classes from the original data set together.

classNames = camVidClassNames11;

Use the supporting function camVidColorMap11 to create a colormap for the 11 classes.

cmap = camVidColorMap11;

Display the image with an overlay showing the pixels with the ground truth label Bicyclist.

classOfInterest = "Bicyclist";
notTheClassOfInterest = T ~= classOfInterest;

TClassOfInterest = T;
TClassOfInterest(notTheClassOfInterest) = "";

overlayImage = labeloverlay(img,TClassOfInterest,ColorMap=cmap);
imshow(overlayImage)

To create an adversarial example, you must specify the adversarial target label for each pixel you want the network to misclassify. In this example, the aim is to get the network to misclassify the Bicyclist pixels as another class. Therefore, you need to specify target classes for each of the Bicyclist pixels.

Using the supporting function eraseClass, defined at the end of this example, create adversarial target labels by replacing all Bicyclist pixel labels with the label of the nearest pixel that is not in the Bicyclist class [3].

TDesired = eraseClass(T,classOfInterest);

overlayImage = labeloverlay(img,TDesired,ColorMap=cmap);
figure
imshow(overlayImage)
pixelLabelColorbar(cmap,classNames);

The labels of the Bicyclist pixels are now Road, Building, or Pavement.

#### Prepare Data

To create the adversarial example using the image and the adversarial target labels, you must first prepare the image and the labels.

Prepare the image by converting it to a dlarray.

X = dlarray(single(img), "SSCB");

Prepare the label by one-hot encoding it. Because some of the pixels have undefined labels, replace NaN values with 0.

TDesired = onehotencode(TDesired,3,"single",ClassNames=classNames);
TDesired(isnan(TDesired)) = 0;
TDesired = dlarray(TDesired,"SSCB");

Use the adversarial target labels to create an adversarial example using the basic iterative method (BIM) [4]. The BIM iteratively calculates the gradient ${\nabla }_{X}L\left(X,T\right)$ of the loss function $L$ with respect to the image $X$ you want to find an adversarial example for and the adversarial target labels $T$. The negative of this gradient describes the direction to "push" the image in to make the output closer to the desired class labels.

The adversarial example image is calculated iteratively as follows:

${\mathit{X}}_{\mathrm{adv}}=\mathit{X}-\alpha .\mathrm{sign}\left({\nabla }_{\mathit{X}}\mathit{L}\left(\mathit{X},\mathit{T}\right)\right)$.

Parameter $\alpha$ controls the size of the push for a single iteration. After each iteration, clip the perturbation to ensure the magnitude does not exceed $ϵ$. Parameter $ϵ$ defines a ceiling on how large the total change can be over all the iterations. A larger $ϵ$ value increases the chance of generating a misclassified image, but makes the change in the image more visible.

Set the epsilon value to 5, set the step size alpha to 1, and perform 10 iterations.

epsilon = 5;
alpha = 1;
numIterations = 10;

Keep track of the perturbation and clip any values that exceed epsilon.

delta = zeros(size(X),like=X);
for i = 1:numIterations

delta(delta > epsilon) = epsilon;
delta(delta < -epsilon) = -epsilon;
end

Display the original image, the perturbation added to the image, and the adversarial image.

The added perturbation is imperceptible, demonstrating how adversarial examples can exploit robustness issues within a network.

### Predict Pixel Labels

Predict the class labels of the original image and the adversarial image using the semantic segmentation network.

Y = semanticseg(extractdata(X),net);

Display an overlay of the predictions for both images.

overlayImage = labeloverlay(uint8(extractdata(X)),Y,ColorMap=cmap);

figure
tiledlayout("flow",TileSpacing="tight")
nexttile
imshow(uint8(extractdata(X)))
title("Original Image")
nexttile
imshow(overlayImage)
pixelLabelColorbar(cmap,classNames);
title("Original Predicted Labels")
nexttile
nexttile
pixelLabelColorbar(cmap,classNames);

The network correctly identifies the bicyclist in the original image. However, because of imperceptible perturbation, the network mislabels the bicyclist in the adversarial image.

### Supporting Functions

#### Convert CamVid Label Image to a Categorical

The supporting function convertCamVidLabelImage takes as input a label image from the CamVid data set and converts it to a categorical array.

function labelImage = convertCamVidLabelImage(image)

colorMap32 = camVidColorMap32;
map32To11 = cellfun(@(x,y)repmat(x,size(y,1),1), ...
num2cell((1:numel(colorMap32))'), ...
colorMap32, ...
UniformOutput=false);

colorMap32 = cat(1,colorMap32{:});
map32To11 = cat(1,map32To11{:});

labelImage = rgb2ind(double(image)./255,colorMap32);
labelImage = map32To11(labelImage+1);
labelImage = categorical(labelImage,1:11,camVidClassNames11);

end

#### CamVid Color Map (32 classes)

The supporting function camVidColorMap32 returns the color map for the 32 original classes in the CamVid data set.

function cmap = camVidColorMap32

cmap = {

% Sky
[
128 128 128
]

% Building
[
0 128  64     % Bridge
128   0   0     % Building
64 192   0     % Wall
64   0  64     % Tunnel
192   0 128     % Archway
]

% Pole
[
192 192 128     % Column_Pole
0   0  64     % TrafficCone
]

[
128   0 192     % LaneMkgsDriv
192   0  64     % LaneMkgsNonDriv
]

% Pavement
[
0   0 192     % Sidewalk
64 192 128     % ParkingBlock
]

% Tree
[
128 128   0     % Tree
192 192   0     % VegetationMisc
]

% SignSymbol
[
192 128 128     % SignSymbol
128 128  64     % Misc_Text
0  64  64     % TrafficLight
]

% Fence
[
64  64 128     % Fence
]

% Car
[
64   0 128     % Car
64 128 192     % SUVPickupTruck
192 128 192     % Truck_Bus
192  64 128     % Train
128  64  64     % OtherMoving
]

% Pedestrian
[
64  64   0     % Pedestrian
192 128  64     % Child
64   0 192     % CartLuggagePram
64 128  64     % Animal
]

% Bicyclist
[
0 128 192     % Bicyclist
192   0 192     % MotorcycleScooter
]

% Void
[
0   0   0     % Void
]

};

% Normalize between [0 1].
cmap = cellfun(@(x)x./255,cmap,UniformOutput=false);

end

#### CamVid Color Map (11 classes)

The supporting function camVidColorMap11 returns the color map for the 11 umbrella classes in the CamVid data set.

function cmap = camVidColorMap11

cmap = [
128 128 128     % Sky
128   0   0     % Building
192 192 192     % Pole
60  40 222     % Pavement
128 128   0     % Tree
192 128 128     % SignSymbol
64  64 128     % Fence
64   0 128     % Car
64  64   0     % Pedestrian
0 128 192     % Bicyclist
];

% Normalize between [0 1].
cmap = cmap ./ 255;

end

#### CamVid Labels (11 classes)

The supporting function classNames returns the 11 umbrella classes of the CamVid data set.

function classNames = camVidClassNames11
classNames = [
"Sky"
"Building"
"Pole"
"Pavement"
"Tree"
"SignSymbol"
"Fence"
"Car"
"Pedestrian"
"Bicyclist"
];
end

#### Pixel Label Colorbar Function

The supporting function pixelLabelColorbar adds a colorbar to the current axis. The colorbar is formatted to display the class names with the color.

function pixelLabelColorbar(cmap, classNames)
% Add a colorbar to the current axis. The colorbar is formatted
% to display the class names with the color.

colormap(gca,cmap)

% Add colorbar to current figure.
c = colorbar("peer", gca);

% Use class names for tick marks.
c.TickLabels = classNames;
numClasses = size(cmap,1);

% Center tick labels.
c.Ticks = 1/(numClasses*2):1/numClasses:1;

% Remove tick mark.
c.TickLength = 0;
end

#### Erase Class Function

The supporting function eraseClass removes class classToErase from the label image T by relabeling the pixels in class classToErase. For each pixel in class classToErase, the eraseClass function sets the pixel label to the class of the nearest pixel not in class classToErase.

function TDesired = eraseClass(T,classToErase)
TDesired = T;
end

Y = predict(net,X);
loss = mse(Y,target);
end

Show an image, the corresponding adversarial image, and the difference between the two (perturbation).

figure
tiledlayout(1,3,TileSpacing="compact")
nexttile
imgTrue = uint8(extractdata(image));
imshow(imgTrue)
title("Original Image")

nexttile
imshow(perturbation)
title("Perturbation")

nexttile