Train Convolutional Neural Network for Regression

This example shows how to train a convolutional neural network to predict the angles of rotation of handwritten digits.

Regression tasks involve predicting continuous numerical values instead of discrete class labels. This example constructs a convolutional neural network architecture for regression, trains the network, and the uses the trained network to predict angles of rotated handwritten digits.

This diagram illustrates the flow of image data through a regression neural network.

Load Data

The data set contains synthetic images of handwritten digits together with the corresponding angles (in degrees) by which each image is rotated.

Load the training and test data from the MAT files DigitsDataTrain.mat and DigitsDataTest.mat, respectively. The variables anglesTrain and anglesTest are the rotation angles in degrees. The training and test data sets each contain 5000 images.

load DigitsDataTrain
load DigitsDataTest

Display some of the training images.

numObservations = size(XTrain,4);
idx = randperm(numObservations,49);
I = imtile(XTrain(:,:,:,idx));
figure
imshow(I);

Figure contains an axes object. The hidden axes object contains an object of type image.

Partition XTrain and anglesTrain into training and validation partitions using the trainingPartitions function, attached to this example as a supporting file. To access this function, open the example as a live script. Set aside 15% of the training data for validation.

[idxTrain,idxValidation] = trainingPartitions(numObservations,[0.85 0.15]);

XValidation = XTrain(:,:,:,idxValidation);
anglesValidation = anglesTrain(idxValidation);

XTrain = XTrain(:,:,:,idxTrain);
anglesTrain = anglesTrain(idxTrain);

Check Data Normalization

When training neural networks, it often helps to make sure that your data is normalized in all stages of the network. Normalization helps stabilize and speed up network training using gradient descent. If your data is poorly scaled, then the loss can become NaN and the network parameters can diverge during training. Common ways of normalizing data include rescaling the data so that its range becomes [0,1] or so that it has a mean of zero and standard deviation of one. You can normalize the following data:

Input data. Normalize the predictors before you input them to the network. In this example, the input images are already normalized to the range [0,1].
Layer outputs. You can normalize the outputs of each convolutional and fully connected layer by using a batch normalization layer.
Responses. If you use batch normalization layers to normalize the layer outputs in the end of the network, then the predictions of the network are normalized when training starts. If the response has a very different scale from these predictions, then network training can fail to converge. If your response is poorly scaled, then try normalizing it and see if network training improves. If you normalize the response before training, then you must transform the predictions of the trained network to obtain the predictions of the original response.

Plot the distribution of the response. The response (the rotation angle in degrees) is approximately uniformly distributed between -45 and 45, which works well without needing normalization. In classification problems, the outputs are class probabilities, which are always normalized.

figure
histogram(anglesTrain)
axis tight
ylabel("Counts")
xlabel("Rotation Angle")

Figure contains an axes object. The axes object with xlabel Rotation Angle, ylabel Counts contains an object of type histogram.

In general, the data does not have to be exactly normalized. However, if you train the network in this example to predict 100*anglesTrain or anglesTrain+500 instead of anglesTrain, then the loss becomes NaN and the network parameters diverge when training starts. These results occur even though the only difference between a network predicting $a Y + b$ and a network predicting $Y$ is a simple rescaling of the weights and biases of the final fully connected layer.

If the distribution of the input or response is very uneven or skewed, you can also perform nonlinear transformations (for example, taking logarithms) to the data before training the network.

Define Neural Network Architecture

Define the neural network architecture.

For image input, specify an image input layer.
Specify four convolution-batchnorm-ReLU blocks with increasing numbers of filters.
Between each block, specify an average pooling layer with pooling regions and stride of size 2.
At the end of the network, include a fully connected layer with an output size that matches the number of responses.

numResponses = 1;

layers = [
    imageInputLayer([28 28 1])
    convolution2dLayer(3,8,Padding="same")
    batchNormalizationLayer
    reluLayer
    averagePooling2dLayer(2,Stride=2)
    convolution2dLayer(3,16,Padding="same")
    batchNormalizationLayer
    reluLayer
    averagePooling2dLayer(2,Stride=2)
    convolution2dLayer(3,32,Padding="same")
    batchNormalizationLayer
    reluLayer
    convolution2dLayer(3,32,Padding="same")
    batchNormalizationLayer
    reluLayer
    fullyConnectedLayer(numResponses)];

Specify Training Options

Specify the training options. Choosing among the options requires empirical analysis. To explore different training option configurations by running experiments, you can use the Experiment Manager app.

Set the initial learn rate to 0.001 and lower the learning rate after 20 epochs.
Monitor the network accuracy during training by specifying validation data and validation frequency. The software trains the network on the training data and calculates the accuracy on the validation data at regular intervals during training. The validation data is not used to update the network weights.
Display the training progress in a plot and monitor the root mean squared error.
Disable the verbose output.

miniBatchSize  = 128;
validationFrequency = floor(numel(anglesTrain)/miniBatchSize);

options = trainingOptions("sgdm", ...
    MiniBatchSize=miniBatchSize, ...
    InitialLearnRate=1e-3, ...
    LearnRateSchedule="piecewise", ...
    LearnRateDropFactor=0.1, ...
    LearnRateDropPeriod=20, ...
    Shuffle="every-epoch", ...
    ValidationData={XValidation,anglesValidation}, ...
    ValidationFrequency=validationFrequency, ...
    Plots="training-progress", ...
    Metrics="rmse", ...
    Verbose=false);

Train Neural Network

Train the neural network using the trainnet function. For regression, use mean squared error loss. By default, the trainnet function uses a GPU if one is available. Using a GPU requires a Parallel Computing Toolbox™ license and a supported GPU device. For information on supported devices, see GPU Computing Requirements (Parallel Computing Toolbox). Otherwise, the function uses the CPU. To specify the execution environment, use the ExecutionEnvironment training option.

net = trainnet(XTrain,anglesTrain,layers,"mse",options);

Test Network

Test the neural network using the testnet function. For regression, evaluate the root mean squared error (RMSE). By default, the testnet function uses a GPU if one is available. To select the execution environment manually, use the ExecutionEnvironment argument of the testnet function.

rmse = testnet(net,XTest,anglesTest,"rmse")

rmse = 
4.9274

Visualize the accuracy in a plot by making predictions with the test data and comparing the predictions with the targets. Make predictions using the minibatchpredict function. By default, the minibatchpredict function uses a GPU if one is available.

YTest = minibatchpredict(net,XTest);

Plot the predicted values against the targets.

figure
scatter(YTest,anglesTest,"+")
xlabel("Prediction")
ylabel("Target")

hold on
plot([-60 60], [-60 60],"r--")

Figure contains an axes object. The axes object with xlabel Prediction, ylabel Target contains 2 objects of type scatter, line.

Make Predictions with New Data

Use the neural network to make a prediction with the first test image. To make a prediction with a single image, use the predict function. To use a GPU, first convert the data to gpuArray.

X = XTest(:,:,:,1);
if canUseGPU
    X = gpuArray(X);
end
Y = predict(net,X)

Y = single

34.7356

figure
imshow(X)
title("Angle: " + gather(Y))

Figure contains an axes object. The hidden axes object with title Angle: 34.7356 contains an object of type image.