Is there a way to holdout specific data?
3 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
I'm producing decision trees (both classification and regression) of my dataset and I wish to use a specific set of data as the training and a specific set for the testing. Is there a way to do this?
For example, say my dataset is consists of 100 rows, is there a way to tell the software to compute rows 1-75 as the training set and rows 76-100 as the test set?
Thanks in advance
0 comentarios
Respuestas (1)
Udit06
el 23 de Sept. de 2024
Hi Mark,
You can use the array indexing to specify the training and testing sets using indexes. Please find below the code snippet to achieve the same:
% Define the training and testing indices
trainIndices = 1:75;
testIndices = 76:100;
% Split the data
trainData = data(trainIndices, :);
testData = data(testIndices, :);
I hope this helps.
2 comentarios
Udit06
el 23 de Sept. de 2024
Hi Mark,
When you use the fitctree function with the 'CrossVal','on' option, MATLAB automatically performs cross-validation by splitting the data into multiple folds. You can find the same on the following MathWorks documentation:
However, if you want to manually specify the training and test sets, you should handle the splitting yourself rather than relying on the built-in cross-validation. You can find the code snippet on how to train the model using manually specifying train and test sets:
% Clear workspace
clear;
% Load the ionosphere dataset
load ionosphere;
% Define training data ratio and calculate number of training samples
trainRatio = 0.7;
numTrainSamples = round(trainRatio * size(X, 1));
% Split data into training and test sets
X_train = X(1:numTrainSamples, :);
Y_train = Y(1:numTrainSamples);
X_test = X(numTrainSamples+1:end, :);
Y_test = Y(numTrainSamples+1:end, :);
% Train a decision tree classifier
MdlA = fitctree(X_train, Y_train);
% Visualize the decision tree
view(MdlA, 'Mode', 'graph');
% Predict labels for the test set
Y_pred = predict(MdlA, X_test);
% Convert cell arrays to matrices for comparison
Y_test = cell2mat(Y_test);
Y_pred = cell2mat(Y_pred);
% Calculate and display accuracy
accuracy = sum(Y_test == Y_pred) / length(Y_test);
fprintf('Test Set Accuracy: %.2f%%\n', accuracy * 100);
I hope this helps.
Ver también
Categorías
Más información sobre Get Started with Statistics and Machine Learning Toolbox en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!