How does crossval (for k-fold CV) work in MATLAB after training a classifier?

3 visualizaciones (últimos 30 días)

Sanjay Yadav el 7 de Mzo. de 2016

7
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/271843-how-does-crossval-for-k-fold-cv-work-in-matlab-after-training-a-classifier

Comentada: seung ho yeom el 1 de Feb. de 2019

To my knowledge, k-fold CV is a technique for model selection where the data is first divided into k-folds where the data in each fold is stratified. Now, consider the following code:

trainedClassifier = fitcnb(X, Y);
partitionedModel = crossval(trainedClassifier, 'KFold', 10);
accuracy = 1 - kfoldLoss(partitionedModel, 'LossFun', 'ClassifError');

The above code first trains the data in matrix X as per the class labels in vector Y. The trainedClassifier is then used in the function crossval(). My doubt is very simple. Does this line of code

partitionedModel = crossval(trainedClassifier, 'KFold', 10);

divide the matrix X into ten folds and then trains on 9 folds, testes on the remaining fold and this is repeated 10 times with each fold as test matrix or does it simply use the trainedClassifier that was trained in the previous line on the whole matrix X and then testes on each fold as I can only see that the fitcnb has been used only once. Does the function crossval() works upon it internally? If it doesn't, then the training is being done on the whole data instead of on the 9 folds in each iteration as is defined by cross-validation.

Fellow members of the community, I will be highly obliged if this doubt of mine can be cleared. Thanking you in anticipation.

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Raghav G Raghav G el 22 de Oct. de 2018

I too have the same question. Did anyone find the answer?

Fulin Wei el 28 de Dic. de 2018

I have the same question. Do you have any answer now?

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Respuestas (3)

Don Mathis el 30 de Nov. de 2018

3
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/271843-how-does-crossval-for-k-fold-cv-work-in-matlab-after-training-a-classifier#answer_349860

The answer is that it divides the dataset into 10 folds and trains the model 10 times on 9 folds each time, using the remaining fold as the test set. The only information taken from 'trainedClassifier' are the hyperparameter values, which are used in each of the 10 trainings. 'fitcnb' is not called 10 times, 'ClassificationNaiveBayes.fit' is.

11 comentarios
Mostrar 9 comentarios más antiguosOcultar 9 comentarios más antiguos

Don Mathis el 15 de En. de 2019

Abrir en MATLAB Online

No no, hyperparameters are not obtained from the data at all. Hyperparameters are the settings that you pass to fitcnb, such as 'DistributionNames' or 'Kernel'.

I will try to explain what is happening in the originally posted code:

trainedClassifier = fitcnb(X, Y);

This line fits a single Naive Bayes model to the full dataset {X,Y} using default hyperparameters (settings), such as 'normal' distributions. The result is a ClassificationNaiveBayes object.

partitionedModel = crossval(trainedClassifier, 'KFold', 10);

This line splits the data {X,Y} into 10 folds. It then trains 10 models, each trained on 9 folds. For each of these models, it uses the settings from 'trainedClassifier', namely 'normal' distributions. The result is a ClassificationPartitionedModel, which contains the 10 models that were just trained.

accuracy = 1 - kfoldLoss(partitionedModel, 'LossFun', 'ClassifError');

This line runs each of the 10 models inside 'partitionedModel' on its own held-out test set, and computes the Loss for each model on its test set. These 10 individual loss values are added up to obtain the full 'kfold' loss. This loss is the classification error rate over the full dataset, because the 10 test folds make up the full dataset. This is the "out of sample" error rate on the full dataset. 'accuracy' is then 1 minus the error rate.

Finally, you would then use 'trainedClassifier' to make predictions on new data. 'accuracy' is now an estimate of the out-of-sample accuracy of 'trainedClassifier'. At this point, 'partitionedModel' can be discarded. Its only purpose was to provide an estimate of the out-of-sample accuracy of trainedClassifier. In fact, you cannot use 'partitionedModel' for prediction. It has no 'predict' method.

Don Mathis el 17 de En. de 2019

Those are fit as part of the normal fitting process.

seung ho yeom el 1 de Feb. de 2019

Okay, now i fully understand. Thank you.

Iniciar sesión para comentar.

fatemeh ghorbani el 3 de Dic. de 2017

1
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/271843-how-does-crossval-for-k-fold-cv-work-in-matlab-after-training-a-classifier#answer_294384

do you find any answer?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

James Ratti el 21 de Oct. de 2018

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/271843-how-does-crossval-for-k-fold-cv-work-in-matlab-after-training-a-classifier#answer_342477

Any answers??

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Categorías

AI and Statistics Statistics and Machine Learning Toolbox Classification Classification Trees

Más información sobre Classification Trees en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by