How to include all variables in each decision tree of an ensemble?
4 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Hi everyone. I am fitting the following 10-tree ensemble.
X = rand(1000,50);
Y = rand(1000,1);
N = size(X,2);
Ntrees=10;
t = templateTree('NumVariablesToSample','all');
Mdl = fitrensemble(X,Y,'Method','LSBoost','Learners',t,'NumLearningCycles',Ntrees);
Below I extract the number of variables that are included in each of the 10 trees.
z = false(N,Ntrees);
for i = 1:Ntrees
idx = unique(Mdl.Trained{i}.CutPredictorIndex);
idx(idx==0)=[];
z(idx,i) = 1;
end
sum(z)
>> ans =
8 10 8 10 9 9 10 8 9 9
Despite setting 'NumVariablesToSample’ to ‘all’, when I extract the variables included in each tree, only 8-10 out of the 50 features are included in each tree. Does anyone have a suggestion on how to force all variables to be included in all trees? Thanks.
0 comentarios
Respuestas (1)
Aditya Patil
el 16 de Feb. de 2021
'NumVariablesToSample' defines the number of variables(predictors) which will be considered at any given split. The decision tree algorithm picks random set of predictors, and then selects one of them, based on certain criterias.
It might not be necessary, or sometimes even possible, to use a specific variable in a tree. For example, consider if a prior split leaves samples of only one class. In such a case, selecting a decision boundary for that variable will not be possible.
If you need to use all variables, you can look at some of the other classification algorithms available in MATLAB, such as SVM.
0 comentarios
Ver también
Categorías
Más información sobre Classification Ensembles en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!