How to divide image datastore into training set, validation set and test set for training a CNN network with k-fold cross validation?

20 visualizaciones (últimos 30 días)
I have a image datastore:
filefolder=fullfile("D:\folder");
Images = imageDatastore(filefolder,...
'IncludeSubfolders',true,...
'LabelSource','foldernames');
How can I divide this image datastore into training set, validation set and test set for training a CNN network with k-fold cross validation?
splitEachLabel is a command where I can split the labels accordingly but there was no option for cross validation.
Thanking You in advance.

Respuesta aceptada

Zinea
Zinea el 4 de Sept. de 2024
Editada: Zinea el 4 de Sept. de 2024
To perform k-fold cross-validation with an image datastore in MATLAB, you can manually split the data for each fold. The 'splitEachLabel' function doesn't directly support k-fold cross-validation, but you can implement it by iterating over the number of folds and creating training and validation sets for each fold. Here's a general approach to achieve this:
  1. Decide on the number of folds k you want to use for cross-validation.
  2. Randomly shuffle and partition the data into k folds.
  3. For each fold, designate one part as the validation set and the rest as the training set. The setdiff’ and ‘subset’ functions can be used for finding the training indices by excluding validation indices and partitioning the data into training and validation sets, respectively. You may refer to the following documentation links for more insights:
Here's a sample code to illustrate this process:
% Define the number of folds
k = 5;
% Get the number of images
numImages = numel(Images.Files);
% Shuffle the indices
indices = randperm(numImages);
% Calculate the number of images per fold
numImagesPerFold = floor(numImages / k);
% Iterate over each fold
for fold = 1:k
% Determine the indices for the validation set
valStart = (fold - 1) * numImagesPerFold + 1;
if fold == k
valEnd = numImages; % Include all remaining images in the last fold
else
valEnd = fold * numImagesPerFold;
end
valIndices = indices(valStart:valEnd);
% Determine the indices for the training set
trainIndices = setdiff(indices, valIndices);
% Create the training and validation datastores
trainDatastore = subset(Images, trainIndices);
valDatastore = subset(Images, valIndices);
% (Optional) Create a test datastore if needed
% testDatastore = ...; % Define your test dataset separately if needed
% Use these datastores to train and validate your CNN
% For example:
% net = trainNetwork(trainDatastore, layers, options);
% valResults = classify(net, valDatastore);
% Compute accuracy or other metrics
end
Hope this helps!

Más respuestas (1)

Govind KM
Govind KM el 4 de Sept. de 2024
Hi Bipin,
The “splitEachLabel” function can be used to initially split the image datastore into training and test sets. Following this, the “cvpartition” function can be used to create a partition for k-fold cross validation on the training set. The training, validation and testing sets can then be accessed as needed for model training, testing and validation. A sample code for this is provided below:
% Load the image datastore
filefolder=fullfile("D:\folder");
Images = imageDatastore(filefolder,'IncludeSubfolders',true,'LabelSource','foldernames');
% Split the datastore into training and test sets
[trainImds, testImds] = splitEachLabel(Images, 0.8, 'randomized');
% Split the training set into training and validation sets for k-fold cross-validation
k = 5; % Number of folds
cvp = cvpartition(trainImds.Labels, 'KFold', k);
% Access the training, validation, and test sets
for fold = 1:k
trainIdx = training(cvp, fold);
valIdx = test(cvp, fold);
trainFoldImds = subset(trainImds, trainIdx);
valFoldImds = subset(trainImds, valIdx);
% Train your CNN network using trainFoldImds and validate using valFoldImds
end
% Test your CNN network using testImds
You can refer to the documentation for more details regarding the “splitEachLabel” function, and performing cross-validation using “cvpartition”:
https://www.mathworks.com/help/matlab/ref/matlab.io.datastore.imagedatastore.spliteachlabel.html
For help in training a neural network, you can refer to this example, which uses the “trainnet” function with Image datastores:
Hope this helps!

Categorías

Más información sobre Image Data Workflows en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by