dlconv

Deep learning convolution

Description

The convolution operation applies sliding filters to the input data. Use the dlconv function for deep learning convolution, grouped convolution, and channel-wise separable convolution.

The dlconv function applies the deep learning convolution operation to dlarray data. Using dlarray objects makes working with high dimensional data easier by allowing you to label the dimensions. For example, you can label which dimensions correspond to spatial, time, channel, and batch dimensions using the 'S', 'T', 'C', and 'B' labels, respectively. For unspecified and other dimensions, use the 'U' label. For dlarray object functions that operate over particular dimensions, you can specify the dimension labels by formatting the dlarray object directly, or by using the 'DataFormat' option.

Note

To apply convolution within a layerGraph object or Layer array, use one of the following layers:

example

dlY = dlconv(dlX,weights,bias) applies the deep learning convolution operation to the formatted dlarray object dlX. The function uses sliding convolutional filters defined by weights and adds the constant bias. The output dlY is a formatted dlarray object with the same format as dlX.

The function, by default, convolves over up to three dimensions of dlX labeled 'S' (spatial). To convolve over dimensions labeled 'T' (time), specify weights with a 'T' dimension using a formatted dlarray object or by using the 'WeightsFormat' option.

For unformatted input data, use the 'DataFormat' option.

example

dlY = dlconv(dlX,weights,bias,'DataFormat',FMT) applies the deep learning convolution operation to the unformatted dlarray object dlX with format specified by FMT using any of the previous syntaxes. The output dlY is an unformatted dlarray object with dimensions in the same order as dlX. For example, 'DataFormat','SSCB' specifies data for 2-D convolution with format 'SSCB' (spatial, spatial, channel, batch).

example

dlY = dlconv(___,Name,Value) specifies options using one or more name-value pair arguments using any of the previous syntaxes. For example, 'WeightsFormat','TCU' specifies weights for 1-D convolution with format 'TCU' (time, channel, unspecified).

Examples

collapse all

Create a formatted dlarray object containing a batch of 128 28-by-28 images with 3 channels. Specify the format 'SSCB' (spatial, spatial, channel, batch).

miniBatchSize = 128;
inputSize = [28 28];
numChannels = 3;
X = rand(inputSize(1),inputSize(2),numChannels,miniBatchSize);
dlX = dlarray(X,'SSCB');

View the size and format of the input data.

size(dlX)
ans = 1×4

28    28     3   128

dims(dlX)
ans =
'SSCB'

Initialize the weights and bias for 2-D convolution. For the weights, specify 64 3-by-3 filters. For the bias, specify a vector of zeros.

filterSize = [3 3];
numFilters = 64;
weights = rand(filterSize(1),filterSize(2),numChannels,numFilters);
bias = zeros(1,numFilters);

Apply 2-D convolution using the dlconv function.

dlY = dlconv(dlX,weights,bias);

View the size and format of the output.

size(dlY)
ans = 1×4

26    26    64   128

dims(dlY)
ans =
'SSCB'

Convolve the input data in three groups of two channels each. Apply four filters per group.

Create the input data as 10 observations of size 100-by-100 with six channels.

height = 100;
width = 100;
channels = 6;
numObservations = 10;

X = rand(height,width,channels,numObservations);
dlX = dlarray(X,'SSCB');

Initialize the convolutional filters. Specify three groups of convolutions that each apply four convolution filters to two channels of the input data.

filterHeight = 8;
filterWidth = 8;
numChannelsPerGroup = 2;
numFiltersPerGroup = 4;
numGroups = 3;

weights = rand(filterHeight,filterWidth,numChannelsPerGroup,numFiltersPerGroup,numGroups);

Initialize the bias term.

bias = rand(numFiltersPerGroup*numGroups,1);

Perform the convolution.

dlY = dlconv(dlX,weights,bias);
size(dlY)
ans = 1×4

93    93    12    10

dims(dlY)
ans =
'SSCB'

The 12 channels of the convolution output represent the three groups of convolutions with four filters per group.

Separate the input data into channels and perform convolution on each channel separately.

Create the input data as a single observation with a size of 64-by-64 and 10 channels. Create the data as an unformatted dlarray.

height = 64;
width = 64;
channels = 10;

X = rand(height,width,channels);
dlX = dlarray(X);

Initialize the convolutional filters. Specify an ungrouped convolution that applies a single convolution to all three channels of the input data.

filterHeight = 8;
filterWidth = 8;
numChannelsPerGroup = 1;
numFiltersPerGroup = 1;
numGroups = channels;

weights = rand(filterHeight,filterWidth,numChannelsPerGroup,numFiltersPerGroup,numGroups);

Initialize the bias term.

bias = rand(numFiltersPerGroup*numGroups,1);

Perform the convolution. Specify the dimension labels of the input data using the 'DataFormat' option.

dlY = dlconv(dlX,weights,bias,'DataFormat','SSC');
size(dlY)
ans = 1×3

57    57    10

Each channel is convolved separately, so there are 10 channels in the output.

Create a formatted dlarray object containing 128 sequences of length 512 containing 5 features. Specify the format 'CBT' (channel, batch, time).

numChannels = 5;
miniBatchSize = 128;
sequenceLength = 512;
X = rand(numChannels,miniBatchSize,sequenceLength);
dlX = dlarray(X,'CBT');

Initialize the weights and bias for 1-D convolution. For the weights, specify 64 filters with a filter size of 3. For the bias, specify a vector of zeros.

filterSize = 3;
numFilters = 64;
weights = rand(filterSize,numChannels,numFilters);
bias = zeros(1,numFilters);

Apply 1-D convolution using the dlconv function. To convolve over the 'T' (time) dimension of the input data, specify the weights format 'TCU' (time, channel, unspecified) using the 'WeightsFormat' option.

dlY = dlconv(dlX,weights,bias,'WeightsFormat','TCU');

View the size and format of the output.

size(dlY)
ans = 1×3

64   128   510

dims(dlY)
ans =
'CBT'

Input Arguments

collapse all

Input data, specified as a formatted dlarray, an unformatted dlarray, or a numeric array.

If dlX is an unformatted dlarray or a numeric array, then you must specify the format using the 'DataFormat' option. If dlX is a numeric array, then either weights or bias must be a dlarray object.

The function, by default, convolves over up to three dimensions of dlX labeled 'S' (spatial). To convolve over dimensions labeled 'T' (time), specify weights with a 'T' dimension using a formatted dlarray object or by using the 'WeightsFormat' option.

Convolutional filters, specified as a formatted dlarray, an unformatted dlarray, or a numeric array.

The size and format of the weights depends on the type of task. If weights is an unformatted dlarray or a numeric array, then the size and shape of weights depends on the 'WeightsFormat' option.

The following table describes the size and format of the weights for various tasks. You can specify an array with the dimensions in any order using formatted dlarray objects or by using the 'WeightsFormat' option. When the weights has multiple dimensions with the same label (for example, multiple dimensions labeled 'S'), then those dimensions must be in ordered as described in this table.

WeightsFormat
1-D convolution'S' (spatial) or 'T' (time)Filter size

filterSize-by-numChannels-by-numFilters array, where filterSize is the size of the 1-D filters, numChannels is the number of channels of the input data, and numFilters is the number of filters.

'SCU' (spatial, channel, unspecified)
'C' (channel)Number of channels
'U' (unspecified)Number of filters
1-D grouped convolution'S' (spatial) or 'T' (time)Filter size

filterSize-by-numChannelsPerGroup-by-numFiltersPerGroup-by-numGroups array, where filterSize is the size of the 1-D filters, numChannelsPerGroup is the number of channels per group of the input data, and numFiltersPerGroup is the number of filters per group.

numChannelsPerGroup must equal the number of the channels of the input data divided by numGroups.

'SCUU' (spatial, channel, unspecified, unspecified)
'C' (channel)Number of channels per group
First 'U' (unspecified)Number of filters per group
Second 'U' (unspecified)Number of groups
2-D convolutionFirst 'S' (spatial)Filter height

filterSize(1)-by-filterSize(2)-by-numChannels-by-numFilters array, where filterSize(1) and filterSize(2) are the height and width of the 2-D filters, respectively, numChannels is the number of channels of the input data, and numFilters is the number of filters.

'SSCU' (spatial, spatial, channel, unspecified)
Second 'S' (spatial) or 'T' (time)Filter width
'C' (channel)Number of channels
'U' (unspecified)Number of filters
2-D grouped convolutionFirst 'S' (spatial)Filter height

filterSize(1)-by-filterSize(2)-by-numChannelsPerGroup-by-numFiltersPerGroup-by-numGroups array, where filterSize(1) and filterSize(2) are the height and width of the 2-D filters, respectively, numChannelsPerGroup is the number of channels per group of the input data, and numFiltersPerGroup is the number of filters per group.

numChannelsPerGroup must equal the number of the channels of the input data divided by numGroups.

'SSCUU' (spatial, spatial, channel, unspecified, unspecified)
Second 'S' (spatial) or 'T' (time)Filter width
'C' (channel)Number of channels per group
First 'U' (unspecified)Number of filters per group
Second 'U' (unspecified)Number of groups
3-D convolutionFirst 'S' (spatial)Filter height

filterSize(1)-by-filterSize(2)-by-filterSize(3)-by-numChannels-by-numFilters array, where filterSize(1), filterSize(2), and filterSize(3) are the height, width, and depth of the 3-D filters, respectively, numChannels is the number of channels of the input data, and numFilters is the number of filters.

'SSSCU' (spatial, spatial, spatial, channel, unspecified)
Second 'S' (spatial)Filter width
Third 'S' (spatial) or 'T' (time)Filter depth
'C' (channel)Number of channels
'U' (unspecified)Number of filters

For channel-wise separable (also known as depth-wise separable) convolution, use grouped convolution with number of groups equal to the number of channels.

Tip

The function, by default, convolves over up to three dimensions of dlX labeled 'S' (spatial). To convolve over dimensions labeled 'T' (time), specify weights with a 'T' dimension using a formatted dlarray object or by using the 'WeightsFormat' option.

Bias constant, specified as a formatted dlarray, an unformatted dlarray, a numeric vector, or a numeric scalar.

• If bias is a scalar, then the same bias is applied to each output.

• If bias has a nonsingleton dimension, then each element of bias is the bias applied to the corresponding convolutional filter specified by weights. The number of elements of bias must match the number of filters specified by weights.

• If bias is 0, then the bias term is disabled and no bias is added during the convolution operation.

If bias is a formatted dlarray, then the nonsingleton dimension must be a channel dimension with label 'C' (channel).

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'DilationFactor',2 sets the dilation factor for each convolutional filter to 2.

Dimension order of unformatted input data, specified as the comma-separated pair consisting of 'DataFormat' and a character vector or string scalar FMT that provides a label for each dimension of the data.

When specifying the format of a dlarray object, each character provides a label for each dimension of the data and must be one of the following:

• 'S' — Spatial

• 'C' — Channel

• 'B' — Batch (for example, samples and observations)

• 'T' — Time (for example, time steps of sequences)

• 'U' — Unspecified

You can specify multiple dimensions labeled 'S' or 'U'. You can use the labels 'C', 'B', and 'T' at most once.

You must specify 'DataFormat' when the input data is not a formatted dlarray.

Example: 'DataFormat','SSCB'

Data Types: char | string

Dimension order of the weights, specified as the comma-separated pair consisting of 'WeightsFormat' and a character vector or string scalar that provides a label for each dimension of the weights.

The default value of 'WeightsFormat' depends on the task:

1-D convolution'SCU' (spatial, channel, unspecified)
1-D grouped convolution'SCUU' (spatial, channel, unspecified, unspecified)
2-D convolution'SSCU' (spatial, spatial, channel, unspecified)
2-D grouped convolution'SSCUU' (spatial, spatial, channel, unspecified, unspecified)
3-D convolution'SSSCU' (spatial, spatial, spatial, channel, unspecified)

The supported combinations of dimension labels depends on the type of convolution, for more information, see the weights argument.

Tip

The function, by default, convolves over up to three dimensions of dlX labeled 'S' (spatial). To convolve over dimensions labeled 'T' (time), specify weights with a 'T' dimension using a formatted dlarray object or by using the 'WeightsFormat' option.

Example: 'WeightsFormat','TCU'

Step size for traversing the input data, specified as the comma-separated pair consisting of 'Stride' and a numeric scalar or numeric vector. If you specify 'Stride' as a scalar, the same value is used for all spatial dimensions. If you specify 'Stride' as a vector of the same size as the number of spatial dimensions of the input data, the vector values are used for the corresponding spatial dimensions.

The default value of 'Stride' is 1.

Example: 'Stride',3

Data Types: single | double

Filter dilation factor, specified as the comma-separated pair consisting of 'DilationFactor' and one of the following.

• Numeric scalar — The same dilation factor value is applied for all spatial dimensions.

• Numeric vector — A different dilation factor value is applied along each spatial dimension. Use a vector of size d, where d is the number of spatial dimensions of the input data. The ith element of the vector specifies the dilation factor applied to the ith spatial dimension.

Use the dilation factor to increase the receptive field of the filter (the area of the input that the filter can see) on the input data. Using a dilation factor corresponds to an effective filter size of filterSize + (filterSize-1)*(dilationFactor-1).

Example: 'DilationFactor',2

Data Types: single | double

Size of padding applied to edges of data, specified as the comma-separated pair consisting of 'Padding' and one of the following:

• 'same' — Padding size is set so that the output size is the same as the input size when the stride is 1. More generally, the output size of each spatial dimension is ceil(inputSize/stride), where inputSize is the size of the input along a spatial dimension.

• Numeric scalar — The same amount of padding is applied to both ends of all spatial dimensions.

• Numeric vector — A different amount of padding is applied along each spatial dimension. Use a vector of size d, where d is the number of spatial dimensions of the input data. The ith element of the vector specifies the size of padding applied to the start and the end along the ith spatial dimension.

• Numeric matrix — A different amount of padding is applied to the start and end of each spatial dimension. Use a matrix of size 2-by-d, where d is the number of spatial dimensions of the input data. The element (1,d) specifies the size of padding applied to the start of spatial dimension d. The element (2,d) specifies the size of padding applied to the end of spatial dimension d. For example, in 2-D, the format is [top, left; bottom, right].

Data Types: single | double

Value to pad data, specified as one of the following:

ScalarPad with the specified scalar value.

$\left[\begin{array}{ccc}3& 1& 4\\ 1& 5& 9\\ 2& 6& 5\end{array}\right]\to \left[\begin{array}{ccccccc}0& 0& 0& 0& 0& 0& 0\\ 0& 0& 0& 0& 0& 0& 0\\ 0& 0& 3& 1& 4& 0& 0\\ 0& 0& 1& 5& 9& 0& 0\\ 0& 0& 2& 6& 5& 0& 0\\ 0& 0& 0& 0& 0& 0& 0\\ 0& 0& 0& 0& 0& 0& 0\end{array}\right]$

'symmetric-include-edge'Pad using mirrored values of the input, including the edge values.

$\left[\begin{array}{ccc}3& 1& 4\\ 1& 5& 9\\ 2& 6& 5\end{array}\right]\to \left[\begin{array}{ccccccc}5& 1& 1& 5& 9& 9& 5\\ 1& 3& 3& 1& 4& 4& 1\\ 1& 3& 3& 1& 4& 4& 1\\ 5& 1& 1& 5& 9& 9& 5\\ 6& 2& 2& 6& 5& 5& 6\\ 6& 2& 2& 6& 5& 5& 6\\ 5& 1& 1& 5& 9& 9& 5\end{array}\right]$

'symmetric-exclude-edge'Pad using mirrored values of the input, excluding the edge values.

$\left[\begin{array}{ccc}3& 1& 4\\ 1& 5& 9\\ 2& 6& 5\end{array}\right]\to \left[\begin{array}{ccccccc}5& 6& 2& 6& 5& 6& 2\\ 9& 5& 1& 5& 9& 5& 1\\ 4& 1& 3& 1& 4& 1& 3\\ 9& 5& 1& 5& 9& 5& 1\\ 5& 6& 2& 6& 5& 6& 2\\ 9& 5& 1& 5& 9& 5& 1\\ 4& 1& 3& 1& 4& 1& 3\end{array}\right]$

'replicate'Pad using repeated border elements of the input

$\left[\begin{array}{ccc}3& 1& 4\\ 1& 5& 9\\ 2& 6& 5\end{array}\right]\to \left[\begin{array}{ccccccc}3& 3& 3& 1& 4& 4& 4\\ 3& 3& 3& 1& 4& 4& 4\\ 3& 3& 3& 1& 4& 4& 4\\ 1& 1& 1& 5& 9& 9& 9\\ 2& 2& 2& 6& 5& 5& 5\\ 2& 2& 2& 6& 5& 5& 5\\ 2& 2& 2& 6& 5& 5& 5\end{array}\right]$

Output Arguments

collapse all

Convolved feature map, returned as a dlarray with the same underlying data type as dlX.

If the input data dlX is a formatted dlarray, then dlY has the same format as dlX. If the input data is not a formatted dlarray, then dlY is an unformatted dlarray with the same dimension order as the input data.

The size of the 'C' (channel) dimension of dlY depends on the task.

ConvolutionNumber of filters
Grouped convolutionNumber of filters per group multiplied by the number of groups

collapse all

Deep Learning Convolution

The dlconv function applies sliding convolution filters to the input data. The dlconv function supports convolution in one, two, or three spatial dimensions or one time dimension. To learn more about deep learning convolution, see the definition of convolutional layer on the convolution2dLayer reference page.

Extended Capabilities

Introduced in R2019b