## Define Custom Deep Learning Layers

### Tip

This topic explains how to define custom deep learning layers for your problems. For a list of built-in layers in Deep Learning Toolbox™, see List of Deep Learning Layers.

This topic explains the architecture of deep learning layers and how to define custom layers to use for your problems.

TypeDescription
Layer

Define a custom deep learning layer and specify optional learnable parameters.

For an example showing how to define a custom layer with learnable parameters, see Define Custom Deep Learning Layer with Learnable Parameters. For an example showing how to define a custom layer with multiple inputs, see Define Custom Deep Learning Layer with Multiple Inputs.

Classification Output Layer

Define a custom classification output layer and specify a loss function.

For an example showing how to define a custom classification output layer and specify a loss function, see Define Custom Classification Output Layer.

Regression Output Layer

Define a custom regression output layer and specify a loss function.

For an example showing how to define a custom regression output layer and specify a loss function, see Define Custom Regression Output Layer.

### Layer Templates

You can use the following templates to define new layers.

### Intermediate Layer Architecture

During training, the software iteratively performs forward and backward passes through the network.

When making a forward pass through the network, each layer takes the outputs of the previous layers, applies a function, and then outputs (forward propagates) the results to the next layers.

Layers can have multiple inputs or outputs. For example, a layer can take X1, …, Xn from multiple previous layers and forward propagate the outputs Z1, …, Zm to the next layers.

At the end of a forward pass of the network, the output layer calculates the loss L between the predictions Y and the true targets T.

During the backward pass of a network, each layer takes the derivatives of the loss with respect to the outputs of the layer, computes the derivatives of the loss L with respect to the inputs, and then backward propagates the results. If the layer has learnable parameters, then the layer also computes the derivatives of the layer weights (learnable parameters). The layer uses the derivatives of the weights to update the learnable parameters.

The following figure describes the flow of data through a deep neural network and highlights the data flow through a layer with a single input X, a single output Z, and a learnable parameter W.

#### Intermediate Layer Properties

Declare the layer properties in the `properties` section of the class definition.

By default, custom intermediate layers have these properties:

PropertyDescription
`Name` Layer name, specified as a character vector or a string scalar. To include a layer in a layer graph, you must specify a nonempty unique layer name. If you train a series network with the layer and `Name` is set to `''`, then the software automatically assigns a name to the layer at training time.
`Description`

One-line description of the layer, specified as a character vector or a string scalar. This description appears when the layer is displayed in a `Layer` array. If you do not specify a layer description, then the software displays the layer class name.

`Type`Type of the layer, specified as a character vector or a string scalar. The value of `Type` appears when the layer is displayed in a `Layer` array. If you do not specify a layer type, then the software displays the layer class name.
`NumInputs`Number of inputs of the layer specified as a positive integer. If you do not specify this value, then the software automatically sets `NumInputs` to the number of names in `InputNames`. The default value is 1.
`InputNames`The input names of the layer specified as a cell array of character vectors. If you do not specify this value and `NumInputs` is greater than 1, then the software automatically sets `InputNames` to `{'in1',...,'inN'}`, where `N` is equal to `NumInputs`. The default value is `{'in'}`.
`NumOutputs`Number of outputs of the layer specified as a positive integer. If you do not specify this value, then the software automatically sets `NumOutputs` to the number of names in `OutputNames`. The default value is 1.
`OutputNames`The output names of the layer specified as a cell array of character vectors. If you do not specify this value and `NumOutputs` is greater than 1, then the software automatically sets `OutputNames` to `{'out1',...,'outM'}`, where `M` is equal to `NumOutputs`. The default value is `{'out'}`.

If the layer has no other properties, then you can omit the `properties` section.

### Tip

If you are creating a layer with multiple inputs, then you must set either the `NumInputs` or `InputNames` in the layer constructor. If you are creating a layer with multiple outputs, then you must set either the `NumOutputs` or `OutputNames` in the layer constructor. For an example, see Define Custom Deep Learning Layer with Multiple Inputs.

#### Learnable Parameters

Declare the layer learnable parameters in the ```properties (Learnable)``` section of the class definition. If the layer has no learnable parameters, then you can omit the ```properties (Learnable)``` section.

Optionally, you can specify the learning rate factor and the L2 factor of the learnable parameters. By default, each learnable parameter has its learning rate factor and L2 factor set to `1`.

For both built-in and custom layers, you can set and get the learn rate factors and L2 regularization factors using the following functions.

FunctionDescription
`setLearnRateFactor`Set the learn rate factor of a learnable parameter.
`setL2Factor`Set the L2 regularization factor of a learnable parameter.
`getLearnRateFactor`Get the learn rate factor of a learnable parameter.
`getL2Factor`Get the L2 regularization factor of a learnable parameter.

To specify the learning rate factor and the L2 factor of a learnable parameter, use the syntaxes ```layer = setLearnRateFactor(layer,'MyParameterName',value)``` and ```layer = setL2Factor(layer,'MyParameterName',value)```, respectively.

To get the value of the learning rate factor and the L2 factor of a learnable parameter, use the syntaxes `getLearnRateFactor(layer,'MyParameterName')` and `getL2Factor(layer,'MyParameterName')` respectively.

For example, this syntax sets the learn rate factor of the learnable parameter with the name `'Alpha'` to `0.1`.

`layer = setLearnRateFactor(layer,'Alpha',0.1);`

#### Forward Functions

A layer uses one of two functions to perform a forward pass: `predict` or `forward`. If the forward pass is at prediction time, then the layer uses the `predict` function. If the forward pass is at training time, then the layer uses the `forward` function. If you do not require two different functions for prediction time and training time, then you can omit the `forward` function. In this case, the layer uses `predict` at training time.

If you define the function `forward` and custom backward function, then you must assign a value to the argument `memory`, which you can use during backward propagation.

The syntax for `predict` is

`[Z1,…,Zm] = predict(layer,X1,…,Xn)`
where `X1,…,Xn` are the `n` layer inputs and `Z1,…,Zm` are the `m` layer outputs. The values `n` and `m` must correspond to the `NumInputs` and `NumOutputs` properties of the layer.

### Tip

If the number of inputs to `predict` can vary, then use `varargin` instead of `X1,…,Xn`. In this case, `varargin` is a cell array of the inputs, where `varargin{i}` corresponds to `Xi`. If the number of outputs can vary, then use `varargout` instead of `Z1,…,Zm`. In this case, `varargout` is a cell array of the outputs, where `varargout{j}` corresponds to `Zj`.

The syntax for `forward` is

`[Z1,…,Zm,memory] = forward(layer,X1,…,Xn)`
where `X1,…,Xn` are the `n` layer inputs, `Z1,…,Zm` are the `m` layer outputs, and `memory` is the memory of the layer.

### Tip

If the number of inputs to `forward` can vary, then use `varargin` instead of `X1,…,Xn`. In this case, `varargin` is a cell array of the inputs, where `varargin{i}` corresponds to `Xi`. If the number of outputs can vary, then use `varargout` instead of `Z1,…,Zm`. In this case, `varargout` is a cell array of the outputs, where `varargout{j}` corresponds to `Zj` for `j`=1,…,`NumOutputs` and `varargout{NumOutputs+1}` corresponds to `memory`.

The dimensions of the inputs depend on the type of data and the output of the connected layers:

Layer InputInput SizeObservation Dimension
2-D imagesh-by-w-by-c-by-N, where h, w, and c correspond to the height, width, and number of channels of the images respectively, and N is the number of observations.4
3-D imagesh-by-w-by-d-by-c-by-N, where h, w, d, and c correspond to the height, width, depth, and number of channels of the 3-D images respectively, and N is the number of observations.5
Vector sequencesc-by-N-by-S, where c is the number of features of the sequences, N is the number of observations, and S is the sequence length.2
2-D image sequencesh-by-w-by-c-by-N-by-S, where h, w, and c correspond to the height, width, and number of channels of the images respectively, N is the number of observations, and S is the sequence length.4
3-D image sequencesh-by-w-by-d-by-c-by-N-by-S, where h, w, d, and c correspond to the height, width, depth, and number of channels of the 3-D images respectively, N is the number of observations, and S is the sequence length.5

#### Backward Function

The layer backward function computes the derivatives of the loss with respect to the input data and then outputs (backward propagates) results to the previous layer. If the layer has learnable parameters (for example, layer weights), then `backward` also computes the derivatives of the learnable parameters. When using the `trainNetwork` function, the layer automatically updates the learnable parameters using these derivatives during the backward pass.

Defining the backward function is optional. If you do not specify a backward function, and the layer forward functions support `dlarray` objects, then the software automatically determines the backward function using automatic differentiation. For a list of functions that support `dlarray` objects, see List of Functions with dlarray Support. Define a custom backward function when you want to:

• Use a specific algorithm to compute the derivatives.

• Use operations in the forward functions that do not support `dlarray` objects.

To define a custom backward function, create a function named `backward`.

The syntax for `backward` is

`[dLdX1,…,dLdXn,dLdW1,…,dLdWk] = backward(layer,X1,…,Xn,Z1,…,Zm,dLdZ1,…,dLdZm,memory)`
where:

• `X1,…,Xn` are the `n` layer inputs

• `Z1,…,Zm` are the `m` outputs of the layer forward functions

• `dLdZ1,…,dLdZm` are the gradients backward propagated from the next layer

• `memory` is the memory output of `forward` if `forward` is defined, otherwise, `memory` is `[]`.

For the outputs, `dLdX1,…,dLdXn` are the derivatives of the loss with respect to the layer inputs and `dLdW1,…,dLdWk` are the derivatives of the loss with respect to the `k` learnable parameters. To reduce memory usage by preventing unused variables being saved between the forward and backward pass, replace the corresponding input arguments with `~`.

### Tip

If the number of inputs to `backward` can vary, then use `varargin` instead of the input arguments after `layer`. In this case, `varargin` is a cell array of the inputs, where `varargin{i}` corresponds to `Xi` for `i`=1,…,`NumInputs`, `varargin{NumInputs+j}` and `varargin{NumInputs+NumOutputs+j}` correspond to `Zj` and `dLdZj`, respectively, for `j`=1,…,`NumOutputs`, and `varargin{end}` corresponds to `memory`.

If the number of outputs can vary, then use `varargout` instead of the output arguments. In this case, `varargout` is a cell array of the outputs, where `varargout{i}` corresponds to `dLdXi` for `i`=1,…,`NumInputs` and `varargout{NumInputs+t}` corresponds to `dLdWt` for `t`=1,…,`k`, where `k` is the number of learnable parameters.

The values of `X1,…,Xn` and `Z1,…,Zm` are the same as in the forward functions. The dimensions of `dLdZ1,…,dLdZm` are the same as the dimensions of `Z1,…,Zm`, respectively.

The dimensions and data type of `dLdX1,…,dLdxn` are the same as the dimensions and data type of `X1,…,Xn`, respectively. The dimensions and data types of `dLdW1`,…,`dLdWk` are the same as the dimensions and data types of `W1`,…,`Wk`, respectively.

To calculate the derivatives of the loss, you can use the chain rule:

`$\frac{\partial L}{\partial {X}^{\left(i\right)}}=\sum _{j}^{}\frac{\partial L}{\partial {z}_{j}}\frac{\partial {z}_{j}}{\partial {X}^{\left(i\right)}}$`

`$\frac{\partial L}{\partial {W}_{i}}=\sum _{j}\frac{\partial L}{\partial {Z}_{j}}\frac{\partial {Z}_{j}}{\partial {W}_{i}}$`

When using the `trainNetwork` function, the layer automatically updates the learnable parameters using the derivatives `dLdW1`,…,`dLdWk` during the backward pass.

For an example showing how to define a custom backward function, see Specify Custom Layer Backward Function.

#### GPU Compatibility

If the layer forward functions fully support `dlarray` objects, then the layer is GPU compatible. Otherwise, to be GPU compatible, the layer functions must support inputs and return outputs of type `gpuArray`.

Many MATLAB® built-in functions support `gpuArray` and `dlarray` input arguments. For a list of functions that support `dlarray` objects, see List of Functions with dlarray Support. For a list of functions that execute on a GPU, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox). To use a GPU for deep learning, you must also have a CUDA® enabled NVIDIA® GPU with compute capability 3.0 or higher. For more information on working with GPUs in MATLAB, see GPU Computing in MATLAB (Parallel Computing Toolbox).

### Check Validity of Layer

If you create a custom deep learning layer, then you can use the `checkLayer` function to check that the layer is valid. The function checks layers for validity, GPU compatibility, and correctly defined gradients. To check that a layer is valid, run the following command:

`checkLayer(layer,validInputSize,'ObservationDimension',dim)`
where `layer` is an instance of the layer, `validInputSize` is a vector or cell array specifying the valid input sizes to the layer, and `dim` specifies the dimension of the observations in the layer input data. For large input sizes, the gradient checks take longer to run. To speed up the tests, specify a smaller valid input size.

#### Check Validity of Layer Using `checkLayer`

Check the layer validity of the custom layer `preluLayer`.

Define a custom PReLU layer. To create this layer, save the file `preluLayer.m` in the current folder.

Create an instance of the layer and check its validity using `checkLayer`. Specify the valid input size to be the size of a single observation of typical input to the layer. The layer expects 4-D array inputs, where the first three dimensions correspond to the height, width, and number of channels of the previous layer output, and the fourth dimension corresponds to the observations.

Specify the typical size of the input of an observation and set `'ObservationDimension'` to 4.

```layer = preluLayer(20,'prelu'); validInputSize = [24 24 20]; checkLayer(layer,validInputSize,'ObservationDimension',4)```
```Running nnet.checklayer.TestLayerWithoutBackward .......... ....... Done nnet.checklayer.TestLayerWithoutBackward __________ Test Summary: 17 Passed, 0 Failed, 0 Incomplete, 0 Skipped. Time elapsed: 1.5273 seconds. ```

Here, the function does not detect any issues with the layer.

### Include Layer in Network

You can use a custom layer in the same way as any other layer in Deep Learning Toolbox.

Define a custom PReLU layer. To create this layer, save the file `preluLayer.m` in the current folder.

Create a layer array that includes the custom layer `preluLayer`.

```layers = [ imageInputLayer([28 28 1]) convolution2dLayer(5,20) batchNormalizationLayer preluLayer(20,'prelu') fullyConnectedLayer(10) softmaxLayer classificationLayer];```

### Output Layer Architecture

At the end of a forward pass at training time, an output layer takes the predictions (outputs) y of the previous layer and calculates the loss L between these predictions and the training targets. The output layer computes the derivatives of the loss L with respect to the predictions y and outputs (backward propagates) results to the previous layer.

The following figure describes the flow of data through a convolutional neural network and an output layer.

#### Output Layer Properties

Declare the layer properties in the `properties` section of the class definition.

By default, custom output layers have the following properties:

• `Name`Layer name, specified as a character vector or a string scalar. To include a layer in a layer graph, you must specify a nonempty unique layer name. If you train a series network with the layer and `Name` is set to `''`, then the software automatically assigns a name to the layer at training time.

• `Description` – One-line description of the layer, specified as a character vector or a string scalar. This description appears when the layer is displayed in a `Layer` array. If you do not specify a layer description, then the software displays ```"Classification Output"``` or `"Regression Output"`.

• `Type` – Type of the layer, specified as a character vector or a string scalar. The value of `Type` appears when the layer is displayed in a `Layer` array. If you do not specify a layer type, then the software displays the layer class name.

Custom classification layers also have the following property:

• `Classes`Classes of the output layer, specified as a categorical vector, string array, cell array of character vectors, or `'auto'`. If `Classes` is `'auto'`, then the software automatically sets the classes at training time. If you specify the string array or cell array of character vectors `str`, then the software sets the classes of the output layer to `categorical(str,str)`. The default value is `'auto'`.

Custom regression layers also have the following property:

• `ResponseNames`Names of the responses, specified a cell array of character vectors or a string array. At training time, the software automatically sets the response names according to the training data. The default is `{}`.

If the layer has no other properties, then you can omit the `properties` section.

#### Loss Functions

The output layer computes the loss `L` between predictions and targets using the forward loss function and computes the derivatives of the loss with respect to the predictions using the backward loss function.

The syntax for `forwardLoss` is ```loss = forwardLoss(layer, Y, T)```. The input `Y` corresponds to the predictions made by the network. These predictions are the output of the previous layer. The input `T` corresponds to the training targets. The output `loss` is the loss between `Y` and `T` according to the specified loss function. The output `loss` must be scalar.

If the layer forward loss function supports `dlarray` objects, then the software automatically determines the backward loss function. For a list of functions that support `dlarray` objects, see List of Functions with dlarray Support. Alternatively, to define a custom backward loss function, create a function named `backwardLoss`. For an example showing how to define a custom backward loss function, see Specify Custom Output Layer Backward Loss Function.

The syntax for `backwardLoss` is ```dLdY = backwardLoss(layer, Y, T)```. The input `Y` contains the predictions made by the network and `T` contains the training targets. The output `dLdY` is the derivative of the loss with respect to the predictions `Y`. The output `dLdY` must be the same size as the layer input `Y`.

For classification problems, the dimensions of `T` depend on the type of problem.

2-D image classification1-by-1-by-K-by-N, where K is the number of classes and N is the number of observations.4
3-D image classification1-by-1-by-1-by-K-by-N, where K is the number of classes and N is the number of observations.5
Sequence-to-label classificationK-by-N, where K is the number of classes and N is the number of observations.2
Sequence-to-sequence classificationK-by-N-by-S, where K is the number of classes, N is the number of observations, and S is the sequence length.2

The size of `Y` depends on the output of the previous layer. To ensure that `Y` is the same size as `T`, you must include a layer that outputs the correct size before the output layer. For example, to ensure that `Y` is a 4-D array of prediction scores for K classes, you can include a fully connected layer of size K followed by a softmax layer before the output layer.

For regression problems, the dimensions of `T` also depend on the type of problem.

2-D image regression1-by-1-by-R-by-N, where R is the number of responses and N is the number of observations.4
2-D Image-to-image regressionh-by-w-by-c-by-`N`, where h, w, and c are the height, width, and number of channels of the output respectively, and N is the number of observations.4
3-D image regression1-by-1-by-1-by-R-by-N, where R is the number of responses and N is the number of observations.5
3-D Image-to-image regressionh-by-w-by-d-by-c-by-`N`, where h, w, d, and c are the height, width, depth, and number of channels of the output respectively, and N is the number of observations.5
Sequence-to-one regressionR-by-N, where R is the number of responses and N is the number of observations.2
Sequence-to-sequence regressionR-by-N-by-S, where R is the number of responses, N is the number of observations, and S is the sequence length.2

For example, if the network defines an image regression network with one response and has mini-batches of size 50, then `T` is a 4-D array of size 1-by-1-by-1-by-50.

The size of `Y` depends on the output of the previous layer. To ensure that `Y` is the same size as `T`, you must include a layer that outputs the correct size before the output layer. For example, for image regression with R responses, to ensure that `Y` is a 4-D array of the correct size, you can include a fully connected layer of size R before the output layer.

The `forwardLoss` and `backwardLoss` functions have the following output arguments.

FunctionOutput ArgumentDescription
`forwardLoss``loss`Calculated loss between the predictions `Y` and the true target `T`.
`backwardLoss``dLdY`Derivative of the loss with respect to the predictions `Y`.

The `backwardLoss` must output `dLdY` with the size expected by the previous layer and `dLdY` to be the same size as `Y`.

#### GPU Compatibility

If the layer forward functions fully support `dlarray` objects, then the layer is GPU compatible. Otherwise, to be GPU compatible, the layer functions must support inputs and return outputs of type `gpuArray`.

Many MATLAB built-in functions support `gpuArray` and `dlarray` input arguments. For a list of functions that support `dlarray` objects, see List of Functions with dlarray Support. For a list of functions that execute on a GPU, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox). To use a GPU for deep learning, you must also have a CUDA enabled NVIDIA GPU with compute capability 3.0 or higher. For more information on working with GPUs in MATLAB, see GPU Computing in MATLAB (Parallel Computing Toolbox).

#### Include Custom Regression Output Layer in Network

You can use a custom output layer in the same way as any other output layer in Deep Learning Toolbox. This section shows how to create and train a network for regression using a custom output layer.

The example constructs a convolutional neural network architecture, trains a network, and uses the trained network to predict angles of rotated, handwritten digits. These predictions are useful for optical character recognition.

Define a custom mean absolute error regression layer. To create this layer, save the file `maeRegressionLayer.m` in the current folder.

`[XTrain,~,YTrain] = digitTrain4DArrayData;`

Create a layer array and include the custom regression output layer `maeRegressionLayer`.

```layers = [ imageInputLayer([28 28 1]) convolution2dLayer(5,20) batchNormalizationLayer reluLayer fullyConnectedLayer(1) maeRegressionLayer('mae')]```
```layers = 6x1 Layer array with layers: 1 '' Image Input 28x28x1 images with 'zerocenter' normalization 2 '' Convolution 20 5x5 convolutions with stride [1 1] and padding [0 0 0 0] 3 '' Batch Normalization Batch normalization 4 '' ReLU ReLU 5 '' Fully Connected 1 fully connected layer 6 'mae' Regression Output Mean absolute error ```

Set the training options and train the network.

```options = trainingOptions('sgdm','Verbose',false); net = trainNetwork(XTrain,YTrain,layers,options);```

Evaluate the network performance by calculating the prediction error between the predicted and actual angles of rotation.

```[XTest,~,YTest] = digitTest4DArrayData; YPred = predict(net,XTest); predictionError = YTest - YPred;```

Calculate the number of predictions within an acceptable error margin from the true angles. Set the threshold to 10 degrees and calculate the percentage of predictions within this threshold.

```thr = 10; numCorrect = sum(abs(predictionError) < thr); numTestImages = size(XTest,4); accuracy = numCorrect/numTestImages```
```accuracy = 0.7524 ```