Using crossval on a trained classifier model in an inbalanced dataset
2 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Stijn de Vries
el 28 de Nov. de 2020
Respondida: Aditya Patil
el 22 de Dic. de 2020
I am training an SVM, an ensemble of trees and a logistic regression on some data. I have used the classification learner app to generate the code for those classifiers () and have adjusted the code to my needs. Now I want to do oversampling because my data is unbalanced. I am using K-fold cross-validation and so I want to balance out every training set for every fold by inserting duplicate training data of the minority class. However, the functions generated from the app use the function crossval to do k-fold cross validation and as far as I know there is no way to balance the data set in that function.
Furthermore, I have tried to get the code for crossval() and made it into my own function, inserted the function in the code for the SVM classifier and tried to train the classifier. However, I got an error. I think there are 2 functions that have the name crossval(). The one that shows up if you ask matlab for the code is the one that does not accept a trained classifier as an input, which is the function that my code uses, and so I can not look at the code and adjust it to do balancing in every fold.
Is there a way I can get the code for the crossval function that accepts a trained classifier, or, is there a way I can do over-sampling without having to change the crossval function. I do not want to use under-sampling because my data set is too small, and if I do balancing before crossvalidation I might have the same two cases in both the training and the testing set.
0 comentarios
Respuesta aceptada
Aditya Patil
el 22 de Dic. de 2020
Currently, there is no argument available to include oversampling in various classification functions.
As a workaround, you can assign weight to the classes, and then use cross validation.
Alternately, you can use cvpartition to create partitions. Then you can use the partitions to oversample the data, and train classifiers individually on each of the partition.
0 comentarios
Más respuestas (0)
Ver también
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!