Classification problem parsed as regression problem when Split Criterion is supplied to fitcensemble
Mostrar comentarios más antiguos
Hi
I ran a hyperparameter optimization to find the best parameters for a two-class classification problem using fitcensemble. But when I try to use these I get a strange warning:
Warning: You must pass 'SplitCriterion' as a character vector 'mse' for regression.
What is wrong with my code? The warning comes when I use a boosting ensemble as 'method'. When I remove the 'SplitCriterion' everything works fine, but I cannot understand why Matlab somewhere on the line thinks this is a regression problem when I use fit"c"ensemble. Here is a toy example with arbitrarily chosen settings that you can run to reproduce the Warning/Error.
load carsmall
X = table(Acceleration,Cylinders,Displacement,Horsepower,Mfg,Model_Year,Weight,MPG);
X.Cylinders(X.Cylinders < 8) = 0; % Create two classes in the Cylinders variable
t = templateTree( 'MaxNumSplits', 30,...
'MinLeafSize', 10,...
'SplitCriterion', 'gdi');
classificationEnsemble = fitcensemble(X,'Cylinders',...
'Method', 'LogitBoost', ...
'NumLearningCycles',12,...
'Learners',t,...
'KFold',7,...
'LearnRate',0.1);
4 comentarios
Don Mathis
el 3 de Abr. de 2017
Editada: Don Mathis
el 4 de Abr. de 2017
Logitboost internally fits regression trees, and the gdi split criterion doesn't work for regression. So the problem is that 'LogitBoost' is incompatible with 'gdi'. The warning messages you get are pretty bad at explaining the problem, however.
When I run your code, I get a useless model as output. I don't understand how an optimization could have chosen these to be the best hyperparameters when the model is unusable. Did you use the 'bayesopt' function or the 'OptimizeHyperparameters' argument to do the optimization? If so, I would like to see the code for that.
Tobias Pahlberg
el 6 de Abr. de 2017
Don Mathis
el 6 de Abr. de 2017
When I run an optimization I never see successes for LogitBoost+gdi. Nor GentleBoost+gdi. They fail and eventually are not tried any more. Could you post a reproducible example of an optimization that shows successes for those combinations? That would be very helpful.
In any case, there is a problem in that LogitBoost is never run with 'mse', which is the only SplitCriterion it can use. As a workaround, you might try running a separate optimization without optimizing SplitCriterion. Then you could take the best result from the 2 optimizations. Something like this:
load carsmall
X = table(Acceleration,Cylinders,Displacement,Horsepower,Mfg,Model_Year,Weight,MPG);
X.Cylinders(X.Cylinders < 8) = 0; % Create two classes in the Cylinders variable
classificationEnsemble = fitcensemble(X,'Cylinders',...
'NumLearningCycles',12,...
'Learners','Tree',...
'OptimizeHyperparameters', {'Method', 'LearnRate', 'MinLeafSize', 'MaxNumSplits', 'NumVariablesToSample'})
Tobias Pahlberg
el 10 de Abr. de 2017
Respuesta aceptada
Más respuestas (0)
Categorías
Más información sobre Classification Ensembles en Centro de ayuda y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!