augmentedImageDatastore

Transform batches to augment image data

Description

An augmented image datastore transforms batches of training, validation, test, and prediction data, with optional preprocessing such as resizing, rotation, and reflection. Resize images to make them compatible with the input size of your deep learning network. Augment training image data with randomized preprocessing operations to help prevent the network from overfitting and memorizing the exact details of the training images.

To train a network using augmented images, supply the augmentedImageDatastore to the trainnet function. For more information, see Preprocess Images for Deep Learning.

When you use an augmented image datastore as a source of training images, the datastore randomly perturbs the training data for each epoch, so that each epoch uses a slightly different data set. The actual number of training images at each epoch does not change. The transformed images are not stored in memory.
An imageInputLayer normalizes images using the mean of the augmented images, not the mean of the original data set. This mean is calculated once for the first augmented epoch. All other epochs use the same mean, so that the average image does not change during training.
Use an augmented image datastore for efficient preprocessing of images for deep learning, including image resizing. Do not use the ReadFcn option of ImageDatastore objects. ImageDatastore allows batch reading of JPG or PNG image files using prefetching. If you set the ReadFcn option to a custom function, then ImageDatastore does not prefetch and is usually significantly slower.

By default, an augmentedImageDatastore only resizes images to fit the output size. You can configure options for additional image transformations using an imageDataAugmenter.

Creation

Syntax

auimds = augmentedImageDatastore(outputSize,imds)

auimds = augmentedImageDatastore(outputSize,X,Y)

auimds = augmentedImageDatastore(outputSize,X)

auimds = augmentedImageDatastore(outputSize,tbl)

auimds = augmentedImageDatastore(outputSize,tbl,responseNames)

auimds = augmentedImageDatastore(___,Name=Value)

Description

auimds = augmentedImageDatastore(outputSize,imds) creates an augmented image datastore for classification problems using images from image datastore imds. The datastore resizes images to the height and width specified by outputSize.

auimds = augmentedImageDatastore(outputSize,X,Y) creates an augmented image datastore for classification and regression problems. The array X contains the predictor variables and the array Y contains the categorical labels or numeric responses.

auimds = augmentedImageDatastore(outputSize,X) creates an augmented image datastore for predicting responses of image data in array X.

auimds = augmentedImageDatastore(outputSize,tbl) creates an augmented image datastore for classification and regression problems. The table, tbl, contains predictors and responses.

auimds = augmentedImageDatastore(outputSize,tbl,responseNames) creates an augmented image datastore for classification and regression problems. The table, tbl, contains predictors and responses. The responseNames argument specifies the response variables in tbl.

auimds = augmentedImageDatastore(___,Name=Value) also sets writable properties using name-value arguments. For example, augmentedImageDatastore([28,28],imds,OutputSizeMode="centercrop") creates an augmented image datastore that crops images from the center.

example

Input Arguments

expand all

`outputSize` — Size of output images
vector of two positive integers

Size of output images, specified as a vector of two positive integers. The first element specifies the height (number of rows) in the output images, and the second element specifies the width (number of columns).

The output images can have a third dimension that represents the color channels. However, if you specify outputSize as a three-element vector, then the datastore ignores the third element. Instead, the datastore determines the image size in the third dimension in one of these ways:

For input grayscale and RGB images, which have 1 or 3 color channels, the number of output color channels depends on the value of ColorPreprocessing. For example, if you specify outputSize as [28 28 1] but set ColorPreprocessing as "gray2rgb", then the output images have size 28-by-28-by-3.
When the input images do not have 1 or 3 color channels, such as for multispectral or hyperspectral images, then the output images have the same number of color channels as the input images.

This argument sets the OutputSize property.

`imds` — Image datastore
`ImageDatastore` object

Image datastore, specified as an ImageDatastore object.

`X` — Images
4-D numeric array

Images, specified as a 4-D numeric array. The first three dimensions are the height, width, and channels, and the last dimension indexes the individual images.

`Y` — Responses for classification or regression
array of categorical responses | numeric matrix | 4-D numeric array

Responses for classification or regression, specified as one of the following:

For a classification problem, Y is a categorical vector containing the image labels.
For a regression problem, Y can be an:
- n-by-r numeric matrix. n is the number of observations and r is the number of responses.
- h-by-w-by-c-by-n numeric array. h-by-w-by-c is the size of a single response and n is the number of observations.

Responses must not contain NaNs.

Data Types: categorical | double

`tbl` — Input data
`table`

Input data, specified as a table. tbl must contain the predictors in the first column as either absolute or relative image paths or images. The type and location of the responses depend on the problem:

For a classification problem, the response must be a categorical variable containing labels for the images. If the name of the response variable is not specified in the call to augmentedImageDatastore, the responses must be in the second column. If the responses are in a different column of tbl, then you must specify the response variable name using the responseNames argument.
For a regression problem, the responses must be numerical values in the column or columns after the first column. The responses can be either in multiple columns as scalars or in a single column as numeric vectors or cell arrays containing numeric 3-D arrays. When you do not specify the name of the response variable or variables, augmentedImageDatastore accepts the remaining columns of tbl as the response variables. You can specify the response variable names using the responseNames argument.

Responses must not contain NaN values. If there are NaNs in the predictor data, they are propagated through the training, however, in most cases the training fails to converge.

Data Types: table

`responseNames` — Names of response variables in the input table
character vector | cell array of character vectors | string array

Names of the response variables in the input table, specified as one of the following:

For classification or regression tasks with a single response, responseNames must be a character vector or string scalar containing the response variable in the input table.
For regression tasks with multiple responses, responseNames must be a string array or cell array of character vectors containing the response variables in the input table.

Data Types: char | cell | string

Name-Value Arguments

expand all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: auimds = augmentedImageDatastore([28,28],imds,OutputSizeMode="centercrop") creates an augmented image datastore that crops images from the center.

`ColorPreprocessing` — Preprocessing color operations
`"none"` (default) | `"gray2rgb"` | `"rgb2gray"`

Preprocessing color operations performed on input grayscale or RGB images, specified as "none", "gray2rgb", or "rgb2gray". When the image datastore contains a mixture of grayscale and RGB images, use ColorPreprocessing to ensure that all output images have the number of channels required by imageInputLayer.

Note

The augmentedImageDatastore object converts RGB images to grayscale by using the rgb2gray function. If an image has three channels that do not correspond to red, green, and blue channels (such as an image in the L*a*b* color space), then using ColorPreprocessing can give poor results.

The datastore does not perform color preprocessing when:

An input image already has the required number of color channels. For example, if you specify the value "gray2rgb" and an input image already has three channels, then no color preprocessing occurs.
The input images do not have 1 or 3 channels, such as for multispectral or hyperspectral images. In this case, all input images must have the same number of channels.

This argument sets the ColorPreprocessing property.

Data Types: char | string

`DataAugmentation` — Preprocessing applied to input images
`"none"` (default) | `imageDataAugmenter` object

Preprocessing applied to input images, specified as an imageDataAugmenter object or "none". When DataAugmentation is "none", the datastore only resizes images to fit the output size, and does not perform additional preprocessing.

This argument sets the DataAugmentation property.

`DispatchInBackground` — Dispatch observations in background
`false` (default) | `true`

Dispatch observations in the background during training, prediction, or classification, specified as false or true. To use background dispatching, you must have Parallel Computing Toolbox™.

Augmented image datastores only perform background dispatching when used with the trainnet function, and inference functions such as predict and minibatchpredict. Background dispatching does not occur when you call the read function of the datastore directly.

This argument sets the DispatchInBackground property.

`OutputSizeMode` — Method used to resize output images
`"resize"` (default) | `"centercrop"` | `"randcrop"`

Method used to resize output images, specified as one of the following.

"resize" — Scale the image using bilinear interpolation to fit the output size.
Note
augmentedImageDatastore uses the bilinear interpolation method of imresize with antialiasing. Bilinear interpolation enables fast image processing while avoiding distortions such as caused by nearest-neighbor interpolation. In contrast, by default imresize uses bicubic interpolation with antialiasing to produce a high-quality resized image at the cost of longer processing time.
"centercrop" — Take a crop from the center of the training image. The crop has the same size as the output size.
"randcrop" — Take a random crop from the training image. The random crop has the same size as the output size.

This argument sets the OutputSizeMode property.

Data Types: char | string

Properties

expand all

`ColorPreprocessing` — Preprocessing color operations
`"none"` (default) | `"gray2rgb"` | `"rgb2gray"`

Note

The datastore does not perform color preprocessing when:

An input image already has the required number of color channels. For example, if you specify the value "gray2rgb" and an input image already has three channels, then no color preprocessing occurs.
The input images do not have 1 or 3 channels, such as for multispectral or hyperspectral images. In this case, all input images must have the same number of channels.

Data Types: char | string

`DataAugmentation` — Preprocessing applied to input images
`"none"` (default) | `imageDataAugmenter` object

`DispatchInBackground` — Dispatch observations in background
`false` (default) | `true`

Dispatch observations in the background during training, prediction, or classification, specified as false or true. To use background dispatching, you must have Parallel Computing Toolbox.

`MiniBatchSize` — Number of observations in each batch
`128` | positive integer

Number of observations that are returned in each batch. You can change the value of MiniBatchSize only after you create the datastore.

Training and prediction functions that specify a mini-batch size, such as trainingOptions, minibatchpredict, and testnet, do not set the MiniBatchSize property. For best performance, use the same mini-batch size for your datastore as for your training and prediction functions.

`NumObservations` — Total number of observations in the datastore
Read-only: positive integer

This property is read-only.

Total number of observations in the augmented image datastore, returned as a positive integer. The number of observations is the length of one training epoch.

`OutputSize` — Size of output images
vector of two positive integers

The OutputSize property does not indicate the number of color channels of the output images. When you read from the datastore, the output images can have a third dimension that represents the color channels.

For input grayscale and RGB images, which have 1 or 3 color channels, the number of output channels depends on the value of ColorPreprocessing. For example, when ColorPreprocessing is "gray2rgb", then the output size in the third dimension is 3. When ColorPreprocessing is "rgb2gray", then the output images do not have a third dimension.
When the input images do not have 1 or 3 color channels, such as for multispectral or hyperspectral images, the output size in the third dimension is equal to the number of color channels of the input images.

`OutputSizeMode` — Method used to resize output images
`"resize"` (default) | `"centercrop"` | `"randcrop"`

Method used to resize output images, specified as one of the following.

"resize" — Scale the image using bilinear interpolation to fit the output size.
Note
augmentedImageDatastore uses the bilinear interpolation method of imresize with antialiasing. Bilinear interpolation enables fast image processing while avoiding distortions such as caused by nearest-neighbor interpolation. In contrast, by default imresize uses bicubic interpolation with antialiasing to produce a high-quality resized image at the cost of longer processing time.
"centercrop" — Take a crop from the center of the training image. The crop has the same size as the output size.
"randcrop" — Take a random crop from the training image. The random crop has the same size as the output size.

Data Types: char | string

Object Functions

`combine`	Combine data from multiple datastores
`hasdata`	Determine if data is available to read
`numpartitions`	Number of datastore partitions
`partition`	Partition a datastore
`partitionByIndex`	Partition `augmentedImageDatastore` according to indices
`preview`	Preview subset of data in datastore
`read`	Read data from `augmentedImageDatastore`
`readall`	Read all data in datastore
`readByIndex`	Read data specified by index from `augmentedImageDatastore`
`reset`	Reset datastore to initial state
`shuffle`	Shuffle data in `augmentedImageDatastore`
`subset`	Create subset of datastore or FileSet
`transform`	Transform datastore
`isPartitionable`	Determine whether datastore is partitionable
`isShuffleable`	Determine whether datastore is shuffleable

Examples

collapse all

Train Network with Augmented Images

Open Live Script

Train a convolutional neural network using augmented image data. Data augmentation helps prevent the network from overfitting and memorizing the exact details of the training images.

Load the sample data, which consists of synthetic images of handwritten digits. XTrain is a 28-by-28-by-1-by-5000 array, where:

28 is the height and width of the images.
1 is the number of channels.
5000 is the number of synthetic images of handwritten digits.

labelsTrain is a categorical vector containing the labels for each observation.

load DigitsDataTrain

Set aside 1000 of the images for network validation.

idx = randperm(size(XTrain,4),1000);
XValidation = XTrain(:,:,:,idx);
XTrain(:,:,:,idx) = [];
TValidation = labelsTrain(idx);
labelsTrain(idx) = [];

Create an imageDataAugmenter object that specifies preprocessing options for image augmentation, such as resizing, rotation, translation, and reflection. Randomly translate the images up to three pixels horizontally and vertically, and rotate the images with an angle up to 20 degrees.

imageAugmenter = imageDataAugmenter( ...
    'RandRotation',[-20,20], ...
    'RandXTranslation',[-3 3], ...
    'RandYTranslation',[-3 3])

imageAugmenter = 
  imageDataAugmenter with properties:

           FillValue: 0
     RandXReflection: 0
     RandYReflection: 0
        RandRotation: [-20 20]
           RandScale: [1 1]
          RandXScale: [1 1]
          RandYScale: [1 1]
          RandXShear: [0 0]
          RandYShear: [0 0]
    RandXTranslation: [-3 3]
    RandYTranslation: [-3 3]

Create an augmentedImageDatastore object to use for network training and specify the image output size. During training, the datastore performs image augmentation and resizes the images. The datastore augments the images without saving any images to memory. trainnet updates the network parameters and then discards the augmented images.

imageSize = [28 28 1];
augimds = augmentedImageDatastore(imageSize,XTrain,labelsTrain,'DataAugmentation',imageAugmenter);

Specify the convolutional neural network architecture.

layers = [
    imageInputLayer(imageSize)
    
    convolution2dLayer(3,8,'Padding','same')
    batchNormalizationLayer
    reluLayer   
    
    maxPooling2dLayer(2,'Stride',2)
    
    convolution2dLayer(3,16,'Padding','same')
    batchNormalizationLayer
    reluLayer   
    
    maxPooling2dLayer(2,'Stride',2)
    
    convolution2dLayer(3,32,'Padding','same')
    batchNormalizationLayer
    reluLayer   
    
    fullyConnectedLayer(10)
    softmaxLayer];

Specify the training options. Choosing among the options requires empirical analysis. To explore different training option configurations by running experiments, you can use the Experiment Manager app.

opts = trainingOptions('sgdm', ...
    'MaxEpochs',15, ...
    'Shuffle','every-epoch', ...
    'Plots','training-progress', ...
    'Metrics','accuracy', ...
    'Verbose',false, ...
    'ValidationData',{XValidation,TValidation});

Train the neural network using the trainnet function. For classification, use cross-entropy loss. By default, the trainnet function uses a GPU if one is available. Training on a GPU requires a Parallel Computing Toolbox™ license and a supported GPU device. For information on supported devices, see GPU Computing Requirements (Parallel Computing Toolbox). Otherwise, the trainnet function uses the CPU. To specify the execution environment, use the ExecutionEnvironment training option.

net = trainnet(augimds,layers,"crossentropy",opts);

Tips

You can visualize many transformed images in the same figure by using the imtile function. For example, this code displays one mini-batch of transformed images from an augmented image datastore called auimds.
```
minibatch = read(auimds);
imshow(imtile(minibatch.input))
```
By default, resizing is the only image preprocessing operation performed on images. Enable additional preprocessing operations by using the DataAugmentation name-value argument with an imageDataAugmenter object. Each time images are read from the augmented image datastore, a different random combination of preprocessing operations are applied to each image.

Version History

Introduced in R2018a

augmentedImageDatastore

Description

Creation

Syntax

Description

Input Arguments

`outputSize` — Size of output images
vector of two positive integers

`imds` — Image datastore
`ImageDatastore` object

`X` — Images
4-D numeric array

`Y` — Responses for classification or regression
array of categorical responses | numeric matrix | 4-D numeric array

`tbl` — Input data
`table`

`responseNames` — Names of response variables in the input table
character vector | cell array of character vectors | string array

Name-Value Arguments

`ColorPreprocessing` — Preprocessing color operations
`"none"` (default) | `"gray2rgb"` | `"rgb2gray"`

`DataAugmentation` — Preprocessing applied to input images
`"none"` (default) | `imageDataAugmenter` object

`DispatchInBackground` — Dispatch observations in background
`false` (default) | `true`

`OutputSizeMode` — Method used to resize output images
`"resize"` (default) | `"centercrop"` | `"randcrop"`

Properties

`ColorPreprocessing` — Preprocessing color operations
`"none"` (default) | `"gray2rgb"` | `"rgb2gray"`

`DataAugmentation` — Preprocessing applied to input images
`"none"` (default) | `imageDataAugmenter` object

`DispatchInBackground` — Dispatch observations in background
`false` (default) | `true`

`MiniBatchSize` — Number of observations in each batch
`128` | positive integer

`NumObservations` — Total number of observations in the datastore
Read-only: positive integer

`OutputSize` — Size of output images
vector of two positive integers

`OutputSizeMode` — Method used to resize output images
`"resize"` (default) | `"centercrop"` | `"randcrop"`

Object Functions

Examples

Train Network with Augmented Images

Tips

Version History

See Also

Topics

augmentedImageDatastore

Description

Creation

Syntax

Description

Input Arguments

outputSize — Size of output images vector of two positive integers

imds — Image datastore ImageDatastore object

X — Images 4-D numeric array

Y — Responses for classification or regression array of categorical responses | numeric matrix | 4-D numeric array

tbl — Input data table

responseNames — Names of response variables in the input table character vector | cell array of character vectors | string array

Name-Value Arguments

ColorPreprocessing — Preprocessing color operations "none" (default) | "gray2rgb" | "rgb2gray"

DataAugmentation — Preprocessing applied to input images "none" (default) | imageDataAugmenter object

DispatchInBackground — Dispatch observations in background false (default) | true

OutputSizeMode — Method used to resize output images "resize" (default) | "centercrop" | "randcrop"

Properties

ColorPreprocessing — Preprocessing color operations "none" (default) | "gray2rgb" | "rgb2gray"

DataAugmentation — Preprocessing applied to input images "none" (default) | imageDataAugmenter object

DispatchInBackground — Dispatch observations in background false (default) | true

MiniBatchSize — Number of observations in each batch 128 | positive integer

NumObservations — Total number of observations in the datastore Read-only: positive integer

OutputSize — Size of output images vector of two positive integers

OutputSizeMode — Method used to resize output images "resize" (default) | "centercrop" | "randcrop"

Object Functions

Examples

Train Network with Augmented Images

Tips

Version History

See Also

Topics

`outputSize` — Size of output images
vector of two positive integers

`imds` — Image datastore
`ImageDatastore` object

`X` — Images
4-D numeric array

`Y` — Responses for classification or regression
array of categorical responses | numeric matrix | 4-D numeric array

`tbl` — Input data
`table`

`responseNames` — Names of response variables in the input table
character vector | cell array of character vectors | string array

`ColorPreprocessing` — Preprocessing color operations
`"none"` (default) | `"gray2rgb"` | `"rgb2gray"`

`DataAugmentation` — Preprocessing applied to input images
`"none"` (default) | `imageDataAugmenter` object

`DispatchInBackground` — Dispatch observations in background
`false` (default) | `true`

`OutputSizeMode` — Method used to resize output images
`"resize"` (default) | `"centercrop"` | `"randcrop"`

`ColorPreprocessing` — Preprocessing color operations
`"none"` (default) | `"gray2rgb"` | `"rgb2gray"`

`DataAugmentation` — Preprocessing applied to input images
`"none"` (default) | `imageDataAugmenter` object

`DispatchInBackground` — Dispatch observations in background
`false` (default) | `true`

`MiniBatchSize` — Number of observations in each batch
`128` | positive integer

`NumObservations` — Total number of observations in the datastore
Read-only: positive integer

`OutputSize` — Size of output images
vector of two positive integers

`OutputSizeMode` — Method used to resize output images
`"resize"` (default) | `"centercrop"` | `"randcrop"`