Adding Cross Validation to Classification code

I want to add cross validation (e.g. 10-fold) in my classification code. Every time it shall use 10 percents of data.
How can I do that
code:
clear all
close all
TrainRatio=0.8;
ValidationRatio=0.1;
folder='/Users/pooyan/Documents/normal/'; % change this path to your normal data folder
audio_files=dir(fullfile(folder,'*.ogg'));
nfileNum=length(audio_files);
%nfileNum=200
normal=[];
for i = 1:nfileNum
normal_name = [folder audio_files(i).name];
normal(i,:) = audioread(normal_name);
end
normal=normal';
nLabels = repelem(categorical("normal"),nfileNum,1);
folder='/Users/pooyan/Documents/anomaly/'; % change this path to your anomaly data folder
audio_files=dir(fullfile(folder,'*.ogg'));
afileNum=length(audio_files);
anomaly=[];
for i = 1:afileNum
anomaly_name = [folder audio_files(i).name];
anomaly(i,:) = audioread(anomaly_name);
end
anomaly=anomaly';
aLabels = repelem(categorical("anomaly"),afileNum,1);
% randomize the inputs if necessary
% normal=normal(:,randperm(nfileNum, nfileNum));
% anomaly=anomaly(:,randperm(afileNum, afileNum));
nTrainNum = round(nfileNum*TrainRatio);
aTrainNum = round(afileNum*TrainRatio);
nValidationNum = round(nfileNum*ValidationRatio);
aValidationNum = round(afileNum*ValidationRatio);
audioTrain = [normal(:,1:nTrainNum),anomaly(:,1:aTrainNum)];
labelsTrain = [nLabels(1:nTrainNum);aLabels(1:aTrainNum)];
audioValidation = [normal(:,nTrainNum+1:nTrainNum+nValidationNum),anomaly(:,aTrainNum+1:aTrainNum+aValidationNum)];
labelsValidation = [nLabels(nTrainNum+1:nTrainNum+nValidationNum);aLabels(aTrainNum+1:aTrainNum+aValidationNum)];
audioTest = [normal(:,nTrainNum+nValidationNum+1:end),anomaly(:,aTrainNum+aValidationNum+1:end)];
labelsTest = [nLabels(nTrainNum+nValidationNum+1:end); aLabels(aTrainNum+aValidationNum+1:end)];
fs=44100;
% Create an audioFeatureExtractor object
%to extract the centroid and slope of the mel spectrum over time.
aFE = audioFeatureExtractor("SampleRate",fs, ... %Fs
"SpectralDescriptorInput","melSpectrum", ...
"spectralCentroid",true, ...
"spectralSlope",true);
featuresTrain = extract(aFE,audioTrain);
[numHopsPerSequence,numFeatures,numSignals] = size(featuresTrain);
numHopsPerSequence;
numFeatures;
numSignals;
%treat the extracted features as sequences and use a
%sequenceInputLayer as the first layer of your deep learning model.
featuresTrain = permute(featuresTrain,[2,1,3]); %permute switching dimensions in array
featuresTrain = squeeze(num2cell(featuresTrain,[1,2]));%remove dimensions
numSignals = numel(featuresTrain); %number of signals of normal and anomalies
[numFeatures,numHopsPerSequence] = size(featuresTrain{1});
%Extract the validation features.
featuresValidation = extract(aFE,audioValidation);
featuresValidation = permute(featuresValidation,[2,1,3]);
featuresValidation = squeeze(num2cell(featuresValidation,[1,2]));
%Define the network architecture.
layers = [ ...
sequenceInputLayer(numFeatures)
lstmLayer(50,"OutputMode","last")
fullyConnectedLayer(numel(unique(labelsTrain))) %%labelTrain=audio
softmaxLayer
classificationLayer];
%To define the training options
options = trainingOptions("adam", ...
"Shuffle","every-epoch", ...
"ValidationData",{featuresValidation,labelsValidation}, ... %%labelValidatin=audioValidation
"Plots","training-progress", ...
"Verbose",false);
%To train the network
net = trainNetwork(featuresTrain,labelsTrain,layers,options);
%Test the network %10 preccent
%classify(net,permute(extract(aFE,audioTest),[2 257 35]))
TestFeature=extract(aFE, audioTest);
for i=1:size(TestFeature, 3)
TestFeatureIn = TestFeature(:,:,i)';
classify(net,TestFeatureIn)
end

Respuestas (1)

Aditya Patil
Aditya Patil el 16 de Nov. de 2020
Editada: Aditya Patil el 16 de Nov. de 2020
Currently, KFold validation is not supported for neural networks. I have brought the issue to the notice of the concerned people.
Note that KFold validation is not commonly used for neural networks, as neural networks are generally used with large amount of data, and hence KFold validation is not required.
You can split the dataset into 10 parts, and use different set of 9 parts each time to train the network, and validate on the remaining 1 part. You can do so using cvpartition and training functions. For example,
load fisheriris;
data_size = size(meas);
folds = 10;
c = cvpartition(data_size(1), "KFold", folds);
for i = 1:folds
idx = training(c, i);
train = meas(idx);
test = meas(~idx);
% train and test model here
end
However, if quantity of data is an issue, I would recommend using other machine learning techniques such as SVM or trees, as that might give you better results. You can use the classification learner app for this.

10 comentarios

can you help me with modifying this code
Aditya Patil
Aditya Patil el 16 de Nov. de 2020
I have updated the answer to include example for partitioning
I see , Thanks, How can I use that in my code? That modification is difficult for me. If you can tell me and explain for my code that would be great
Most of the code will remain same.
Split the features and labels variables using the example provided. In trainingOptions, you need to provide the splitted data instead of the entire data. Do the training and testing inside the for loop.
% load data, and define features and labels.
% then split the data
data_size = size(features);
folds = 10;
c = cvpartition(features, "KFold", folds);
for i = 1:folds
idx = training(c, i);
trainfeatures = features(idx);
testfeatures = features(~idx);
trainlabels = labels(idx);
testlabels = labels(~idx);
% train and test model here
options = trainingOptions("adam", ...
"Shuffle","every-epoch", ...
"ValidationData",{testfeatures,testlabels}, ...
"Plots","training-progress", ...
"Verbose",false);
net = trainNetwork(trainfeatures,trainlabels,layers,options);
% further code
end
You can also use the crossval function instead
Thanks, How can I show resdult in one confusion matrix at last?
in my code i have validation test and train how should i define features for size(features)?
How can I define this accirding to my code?
data_size = size(features);
I have modified the code as follows:
AllData = [normal anomaly];
Labels=[nLabels; aLabels];
% K indicates K-fold cross validation
K=10;
cv = cvpartition(Labels,'KFold',K);
% nTrainNum = round(nfileNum*TrainRatio*0.1);
% aTrainNum = round(afileNum*TrainRatio*0.1);
% nValidationNum = round(nfileNum*ValidationRatio*0.1);
% aValidationNum = round(afileNum*ValidationRatio*0.1);
for i=1:K
audioTest = AllData(:, cv.test(i));
labelsTest = Labels(cv.test(i));
audioTrainValidation = AllData(:, ~cv.test(i));
labelsTrainValidation = Labels(~cv.test(i));
% Vp: 10% from training dataset used for validation;
Vp=0.1;
TVL=length(labelsTrainValidation);
ValidationIndex = randperm(TVL, floor(TVL*Vp));
TrainIndex=1:TVL;
TrainIndex(ValidationIndex)=[];
audioTrain = audioTrainValidation(:, TrainIndex);
labelsTrain = labelsTrainValidation(:, TrainIndex);
audioValidation = audioTrainValidation(:, ValidationIndex);
labelsValidation = labelsTrainValidation(:, ValidationIndex);
But I still get error for :
Index in position 2 exceeds array bounds (must not exceed 1).
Error in categorical/parenReference (line 19)
that.codes = this.codes(rowIndices,colIndices);
Error in CrossAllKfold (line 58)
labelsTrain = labelsTrainValidation(:, TrainIndex);
How should I fix that
Aditya Patil
Aditya Patil el 23 de Nov. de 2020
Do not use length, use size instead.
Did that thanks
as I attached the modified code as last step I would like to show all the results from cross validation in one confusion matrix, how can I do that?
all
close all
TrainRatio=0.8;
ValidationRatio=0.1;
folder='/Users/pooyan/Documents/normal/'; % change this path to your normal data folder
audio_files=dir(fullfile(folder,'*.ogg'));
nfileNum=length(audio_files);
nfileNum=100
normal=[];
for i = 1:nfileNum
normal_name = [folder audio_files(i).name];
normal(i,:) = audioread(normal_name);
end
normal=normal';
nLabels = repelem(categorical("normal"),nfileNum,1);
folder='/Users/pooyan/Documents/anomaly/'; % change this path to your anomaly data folder
audio_files=dir(fullfile(folder,'*.ogg'));
afileNum=length(audio_files);
anomaly=[];
for i = 1:afileNum
anomaly_name = [folder audio_files(i).name];
anomaly(i,:) = audioread(anomaly_name);
end
anomaly=anomaly';
aLabels = repelem(categorical("anomaly"),afileNum,1);
% randomize the inputs if necessary
%normal=normal(:,randperm(nfileNum, nfileNum));
%anomaly=anomaly(:,randperm(afileNum, afileNum));
AllData = [normal anomaly];
Labels=[nLabels; aLabels];
% K indicates K-fold cross validation
K=10;
cv = cvpartition(Labels,'KFold',K);
% nTrainNum = round(nfileNum*TrainRatio*0.1);
% aTrainNum = round(afileNum*TrainRatio*0.1);
% nValidationNum = round(nfileNum*ValidationRatio*0.1);
% aValidationNum = round(afileNum*ValidationRatio*0.1);
for i=1:K
audioTest = AllData(:, cv.test(i));
labelsTest = Labels(cv.test(i));
audioTrainValidation = AllData(:, ~cv.test(i));
labelsTrainValidation = Labels(~cv.test(i));
% Vp: 10% from training dataset used for validation;
Vp=0.1;
TVL=length(labelsTrainValidation);
ValidationIndex = randperm(TVL, floor(TVL*Vp));
TrainIndex=1:TVL;
TrainIndex(ValidationIndex)=[];
audioTrain = audioTrainValidation(:, TrainIndex);
labelsTrain = labelsTrainValidation(TrainIndex);
audioValidation = audioTrainValidation(:, ValidationIndex);
labelsValidation = labelsTrainValidation(ValidationIndex);
% audioTrain = [normal(:,((i-1)*nTrainNum)+1:i*nTrainNum),anomaly(:,((i-1)*aTrainNum)+1:i*aTrainNum)];
% labelsTrain = [nLabels(((i-1)*nTrainNum)+1:i*nTrainNum);aLabels(((i-1)*aTrainNum)+1:i*aTrainNum)];
%
% audioValidation = [normal(:,i*(nTrainNum+1:nTrainNum+nValidationNum)),anomaly(:,i*(aTrainNum+1:aTrainNum+aValidationNum))];
% labelsValidation = [nLabels(i*(nTrainNum+1):i*(nTrainNum+nValidationNum));aLabels(i*(aTrainNum+1:aTrainNum+aValidationNum))];
%
% audioTest = [normal(:,i*(nTrainNum+nValidationNum+1):end),anomaly(:,i*(aTrainNum+aValidationNum+1):end)];
% labelsTest = [nLabels(i*(nTrainNum+nValidationNum+1):end); aLabels(i*(aTrainNum+aValidationNum+1):end)];
fs=44100;
% Create an audioFeatureExtractor object
%to extract the centroid and slope of the mel spectrum over time.
aFE = audioFeatureExtractor("SampleRate",fs, ... %Fs
"SpectralDescriptorInput","melSpectrum", ...
"spectralCentroid",true, ...
"spectralSlope",true);
featuresTrain = extract(aFE,audioTrain);
[numHopsPerSequence,numFeatures,numSignals] = size(featuresTrain);
numHopsPerSequence;
numFeatures;
numSignals;
%treat the extracted features as sequences and use a
%sequenceInputLayer as the first layer of your deep learning model.
featuresTrain = permute(featuresTrain,[2,1,3]); %permute switching dimensions in array
featuresTrain = squeeze(num2cell(featuresTrain,[1,2]));%remove dimensions
numSignals = numel(featuresTrain); %number of signals of normal and anomalies
[numFeatures,numHopsPerSequence] = size(featuresTrain{1});
%Extract the validation features.
featuresValidation = extract(aFE,audioValidation);
featuresValidation = permute(featuresValidation,[2,1,3]);
featuresValidation = squeeze(num2cell(featuresValidation,[1,2]));
%Define the network architecture.
layers = [ ...
sequenceInputLayer(numFeatures)
lstmLayer(50,"OutputMode","last")
fullyConnectedLayer(numel(unique(labelsTrain))) %%labelTrain=audio
softmaxLayer
classificationLayer];
%To define the training options
options = trainingOptions("adam", ...
"Shuffle","every-epoch", ...
"ValidationData",{featuresValidation,labelsValidation}, ... %%labelValidatin=audioValidation
"Plots","training-progress", ...
"Verbose",false);
%To train the network
net = trainNetwork(featuresTrain,labelsTrain,layers,options);
%Test the network %10 preccent
%classify(net,permute(extract(aFE,audioTest),[2 257 35]))
TestFeature=extract(aFE, audioTest);
for i=1:size(TestFeature, 3)
TestFeatureIn = TestFeature(:,:,i)';
classify(net,TestFeatureIn)
predict(i) = classify(net,TestFeatureIn)
%labelsPred = categorical(classify(net,TestFeatureIn))
end
%Confusion Matrix Chart
plotconfusion(labelsTest,predict')
end

Iniciar sesión para comentar.

Categorías

Más información sobre Deep Learning Toolbox en Centro de ayuda y File Exchange.

Productos

Preguntada:

el 10 de Nov. de 2020

Comentada:

el 23 de Nov. de 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by