10-fold cross validation with cross val or without

1 visualización (últimos 30 días)
Elena Casiraghi
Elena Casiraghi el 4 de Abr. de 2020
Dear, I'm trying to perform 10-fold HOLDOUT cross validation to train a knn classifier.
For each HOLDOUT I also want to use an internal HOLDOUT to oprtime the knn hyperparameters through Bayesian otpimization (which evaluated the accuracy on the internal holdouts)
Dear, I'm trying to perform 10-fold HOLDOUT cross validation to train a knn classifier.
For each HOLDOUT I also want to use an internal HOLDOUT to oprtime the knn hyperparameters through Bayesian otpimization (which evaluated the accuracy on the internal holdouts).
Therefore, I would like to use a procedure like this one:
nfoldExtHoldout = 10;
dataMat = randi(100, 5001,11); %trainig set composed of 5001 points, each being an 11 dimensional (11 features) point.
labels = rand(5001,1)>0.7; % unbalanced labels
numPts = size(dataMat,1);
for nF = 1: nfoldExtHoldout
[trainIdx,testIdx] = crossvalind('Holdout',numPts,0.3);
trainSet = dataMat(trainIdx,:); %I randomly take the 70 percent of samples in the training set
labelsTrain = labels(trainIdx); % corresponding labels for training points
testSet = dataMat(testIdx,:); %the remaining 30 percent of samples in the test set
labelsTest = labels(testIdx); % corresponding labels for test points
cv = cvPartition(numPts, 'Holdout', 0.3);
% train a knn classifier through Bayesian optimization to choose the best knn parameters
mdlknn = fitcknn(trainSet, labelsTrain, 'OptimizeHyperparameters', 'all', ...
'HyperparameterOptimizationOptions', ...
struct('Verbose', 0,'UseParallel',true, 'CVPartition', cv));
% predict test data
testPred = predict(mdlknn, testSet);
conf = confusionmat(testPred, labelsTest);
knnLoss(nF) = 1- sum(diag(conf))/numPts;
end
is this correct??
What's the difference between the code above and this one?
nfoldExtHoldout = 10;
dataMat = randi(100, 5001,11); %trainig set composed of 5001 points, each being an 11 dimensional (11 features) point.
labels = rand(5001,1)>0.7; % unbalanced labels
numPts = size(dataMat,1);
cv = cvPartition(numPts, 'Holdout', 0.3);
% train a knn classifier through Bayesian optimization to choose the best knn parameters
mdlknn = fitcknn(dataMat, labels, 'OptimizeHyperparameters', 'all', ...
'HyperparameterOptimizationOptions', ...
struct('Verbose', 0,'UseParallel',true, 'CVPartition', cv));
for nf = 1: nfoldExtHoldout
crossValKnn = crossval(mdlknn,'Holdout', rHoldout);
knnLoss(nf) = kfoldLoss(crossValKnn);
end
In other words, I dont' understand what is crossval doing.
I suppose in the code below, the hyperparameter optimization is run just once and the optimized hyperparameters are the same for all the external fold right?
Then crossval trains a separate knn for each holdout???

Respuestas (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by