Cannot test model using cross validation using crossval and kFoldLoss
2 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
I am very new to machine learning, but due to my course I have followed the materials and been able to fit a random forest on my data, and get an error rate that makes sense (beats a dumb prediction and gets better with better chosen features).
My predictor matrix (zscored, this is a subset) is:
-0.0767889379600161 1.43666113298993 4.83220576535887 4.59650550158967
-0.0767889379600161 -0.114493297876403 -0.217229093905045 -0.187718580390875
-0.0767889379600161 -0.114493297876403 -0.217229093905045 -0.187718580390875
-0.0767889379600161 -0.114493297876403 -0.187208672625236 -0.00955946380486005
-0.0767889379600161 -0.114493297876403 -0.217229093905045 -0.187718580390875
-0.0767889379600161 -0.114493297876403 -0.217229093905045 -0.187718580390875
7.39424877391969 1.12643024681666 -0.145180082833503 -0.187718580390875
-0.0767889379600161 2.05712290533646 -0.211225009649084 -0.187718580390875
-0.0767889379600161 0.195737588296863 1.35584098115696 0.229434473078818
And my response is:
'Highly Active'
'Inactive'
'Inactive'
'Inactive'
'Inactive'
'Highly Active'
'Highly Active'
'Highly Active'
'Inactive'
'Highly Active'
'Inactive'
'Highly Active'
My previous method was:
rng default
c = cvpartition(catresponse, 'HoldOut', 0.3);
% Extract the indices of the training and test sets.
trainIdx = training(c);
testIdx = test(c);
% Create the training and test data sets.
XTrain = predictormatrix(trainIdx, :);
XTest = predictormatrix(testIdx, :);
yTrain = catresponse(trainIdx);
yTest = catresponse(testIdx);
% Create an ensemble of 100 trees.
forestModel = fitensemble(XTrain, yTrain, 'Bag', 100,...
'Tree', 'Type', 'Classification');
% Predict and evaluate the ensemble model.
forestPred = predict(forestModel, XTest);
% errs = forestPred ~= yTest;
% testErrRateForest = 100*sum(errs)/numel(errs);
% display(testErrRateForest)
% Perform 10-fold cross validation.
cvModel = crossval(forestModel); % 10-fold is default
cvErrorForest = 100*kfoldLoss(cvModel);
display(cvErrorForest)
% Confusion matrix.
C = confusionmat(yTest, forestPred);
figure(figOpts{:})
imagesc(C)
colorbar
colormap('cool')
[Xgrid, Ygrid] = meshgrid(1:size(C, 1));
Ctext = num2str(C(:));
text(Xgrid(:), Ygrid(:), Ctext)
labels = categories(catresponse);
set(gca, 'XTick', 1:size(C, 1), 'XTickLabel', labels, ...
'YTick', 1:size(C, 1), 'YTickLabel', labels, ...
'XTickLabelRotation', 30, ...
'TickLabelInterpreter', 'none')
xlabel('Predicted Class')
ylabel('Known Class')
title('Forest Confusion Matrix')
Questions:
- Am I doing my cross validation in the right way - my cvLoss code is based on a model built using the 30% holdout, and not something like cvpartition KFold so I am concerned about what cvLoss is actually calculating here.
- Is my cross validation confusion matrix based on the cross validation, or the simpler holdout version with the above code?
- How can I alter my code so that the whole model is "cross validated"?
0 comentarios
Respuestas (0)
Ver también
Categorías
Más información sobre Gaussian Process Regression en Help Center y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!