How to resolve if Validation and Testing accuracy are widely different?

28 visualizaciones (últimos 30 días)
Sahil Bajaj
Sahil Bajaj el 4 de Jul. de 2021
Editada: Prince Kumar el 19 de Nov. de 2021
Dear experts,
I wrote a script in MATLAB to run my machine learning analysis (classification problem). I see a consistent but weird issue in my results (briefly I always get good/high, reproducible validation/training accuracy but my test accuracy is always too low). I checked all five tips mentioned here: https://stackoverflow.com/questions/48718663/validation-and-testing-accuracy-widely-different, but I am still unable to resolve the problem.
I would really appreciate if someone could help me in figuring out the solution.
Thanks,
Sahil

Respuestas (1)

Prince Kumar
Prince Kumar el 19 de Nov. de 2021
Editada: Prince Kumar el 19 de Nov. de 2021
Hi Sahil Bajaj,
This generally happens when your model is learning the data instead of learning the pattern. This scenario is called 'Overfitting'.
You can try the following few things:
  • Use of regularization technique
  • Make sure each set (train, validation and test) has sufficient samples like 60%, 20%, 20% or 70%, 15%, 15% split for training, validation and test sets respectively.
  • Perform k-fold cross validation
  • Randomly shuffle the data before doing the spit, this will make sure that data distribution is nearly the same.If your data is in datastore you can use 'shuffle' function else you can use "randperm" function.

Categorías

Más información sobre Statistics and Machine Learning Toolbox en Help Center y File Exchange.

Productos


Versión

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by