Why is the accuracy reported in the Classification Learner app much higher than the accuracy of the exported model on the held out validation set?

Question

asaf benjamin el 3 de Mzo. de 2020

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/508732-why-is-the-accuracy-reported-in-the-classification-learner-app-much-higher-than-the-accuracy-of-the

Editada: asaf benjamin el 10 de Mzo. de 2020

Problem description

I used the default 5-fold cross-validation (CV) scheme in the Classification Learner app and trained all the available models. The best model (quadratic SVM) has 74.2% accuracy. I used

export model => generate code

and then ran the generated code, again examining the 5-fold CV accuracy. Surprisingly, the validation accuracy of this generated model was 66.8%, which is much lower than that reported by the app.

This has happened repeatedly - everytime I follow this scheme (with different datasets and models) I observe reduced accuracy for the exported model.

Potential explanations

I understand that not resetting the random number generation seed may lead to some variability between runs, but this effect seems too large and consistent to be explained by chance. I also understand that choosing the empirically-best model on the validation set, without having an additional test set, may lead to overestimation of its performance. If the exported model has a different CV partition, that could explain some of the reduction in accuracy, but I wonder if there might be some other explanation that I am missing.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Jalaj Gambhir el 6 de Mzo. de 2020

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/508732-why-is-the-accuracy-reported-in-the-classification-learner-app-much-higher-than-the-accuracy-of-the#answer_418816

Hi,

You can have a look at the answer here. Hope it helps!

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

asaf benjamin el 10 de Mzo. de 2020

Hi,

Thanks for the response, but that answer deals with a very different issue, where one mistakingly compares cross-validation accuracy to accuracy on the training data. Under no circumstances would I try to test my model's performance on the data on which it trained. Rather, I am comparing

The 5-fold cross-validation accuracy reported by the app, to
The 5-fold cross-validation accuracy reported by the code generated by the app.

These should be equal (or at least very close), as I understand it, but they are in fact very different...

Iniciar sesión para comentar.

Why is the accuracy reported in the Classification Learner app much higher than the accuracy of the exported model on the held out validation set?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

Why is the accuracy reported in the Classification Learner app much higher than the accuracy of the exported model on the held out validation set?

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos