Borrar filtros
Borrar filtros

Repeated k fold cross validation

39 visualizaciones (últimos 30 días)
James Alix
James Alix el 20 de Abr. de 2023
Comentada: James Alix el 13 de Jun. de 2023
Hi all
Does anyone know how to performed repeated k-fold cross validation? I can implement a single k-fold e.g. 10 fold many ways e.g.
cvmdlLD = fitcdiscr(X,y,'DiscrimType','linear','KFold',10)
Or by using crossval or cv partition etc.
But I can figure out how to repeat the process, shuffling the data in the folds.
For example, I would like to run the process several times e.g. run 10 fold cross validation once, then shuffle the data in the folds, run again and so on for a reasonably good number of times so as to get an idea of any variability. Ideally I would like to collect the classification performance statistics (accuracy, sensitivity, specificity, AUROC) each time (so if I looped round 50 times I would have 50 accuracies, 50 sensitivities and so on) so that I can look at, for example, the average and standard deviation.
Thanks

Respuestas (1)

Himanshu
Himanshu el 26 de Abr. de 2023
Hello James,
I understand that you want to perform repeated k-fold cross-validation while shuffling the data in the folds for each repetition.
Here's a general outline of the steps:
  1. Set the number of folds and repetitions for your cross-validation process. For example, you can choose 10-fold cross-validation with 50 repetitions.
  2. Create an outer loop that iterates over the number of repetitions. This will allow you to perform the cross-validation process multiple times.
  3. Within the outer loop, shuffle your data to ensure each repetition has different data in the folds. You can do this by generating a random permutation of the indices of your data points.
  4. Create a partition of your shuffled data into training and validation sets using k-fold cross-validation.
  5. Create an inner loop that iterates over the number of folds. Extract the training and validation sets for the current fold within this loop.
By following this outline, you can perform repeated k-fold cross-validation.
You can refer to the below documentation to understand more about Cross Validation in MATLAB.
  2 comentarios
James Alix
James Alix el 27 de Abr. de 2023
Hi
Thanks for responding.
I think I understand that but I can't find a way to implement the k-fold within the inner loop. I think part of the issue is I wish to mean centre my training set (i.e. the left in folds) and apply to the left out fold.
I can implement the code below, which will loop through lots of repetitions leaving a certain number of samples out. But this doesn't work if I have an imbalanced dataset and want to stratify the left out set. I can't seem to get the inner loop to accept anything with a k-fold name pair e.g. cv partition, which I think is because I'm trying to keep the mean centering on the train set working. The first part of the code is below (what follows is the classifier and classification stats).
Thanks
for j=1:500 %number of repetitions
random_U=U(randperm(size(U, 1)),:);
for i=1:4:length(U) % 4 is nummber of samples left out, needs tod divide the number of samples
rows=ismember(c,random_U(i:i+3,1)); % from i:i+(step-1)
train_set_x=X;
train_set_x(rows,:)=[];
train_set_y=Z;
train_set_y(rows,:)=[];
test_set_x=X(rows,:);
test_set_y=Z(rows,:);
m = mean(train_set_x);
train_set_x=bsxfun(@minus,train_set_x,m);
test_set_x =bsxfun(@minus,test_set_x,m);
train_set_x_diseased=train_set_x(train_set_y==1,:);
train_set_x_healthy=train_set_x(train_set_y==2,:);
James Alix
James Alix el 13 de Jun. de 2023
It seems like a simple repeated k-fold should look something like this, where y is the group label (Healthy/Disease) and X is the test measurement:
rng('default')
cv = cvpartition(y,'KFold',10,'Stratify', true)
cp = classperf(y);
for k=1:10 % cross-validation 10 times
cv = repartition(cv);
mdl = fitcdiscr(X,y,'DiscrimType','linear','CVPartition',cv);
classperf(cp,y);
end
While this does loop round it always has a perfect performance, even when I unput data that doesn't separate into two groups. Any thoughts would be most welcome.

Iniciar sesión para comentar.

Categorías

Más información sobre Get Started with Statistics and Machine Learning Toolbox en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by