Esta página aún no se ha traducido para esta versión. Puede ver la versión más reciente de esta página en inglés.

crossval

Clase: ClassificationECOC

Cross-validate multiclass, error-correcting output codes model

Sintaxis

CVMdl = crossval(Mdl)
CVMdl = crossval(Mdl,Name,Value)

Description

ejemplo

CVMdl = crossval(Mdl) returns a cross-validated (partitioned), multiclass, error-correcting output codes (ECOC) model (CVMdl) from a trained ECOC model (Mdl).

By default, crossval uses 10-fold cross validation on the training data to create CVMdl.

ejemplo

CVMdl = crossval(Mdl,Name,Value) returns a partitioned ECOC model with additional options specified by one or more Name,Value pair arguments.

For example, you can specify the number of folds or a holdout sample proportion.

Argumentos de entrada

expandir todo

Multiclass ECOC model, specified as a ClassificationECOC model returned by fitcecoc.

Argumentos de par nombre-valor

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Cross-validation partition, specified as the comma-separated pair consisting of 'CVPartition' and a cvpartition partition object as created by cvpartition. The partition object specifies the type of cross-validation and the indexing for the training and validation sets.

To create a cross-validated model, you can use one of these four name-value pair arguments only: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'.

Ejemplo: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,'KFold',5). Then, you can specify the cross-validated model by using 'CVPartition',cvp.

Fraction of the data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1). If you specify 'Holdout',p, then the software completes these steps:

  1. Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

  2. Store the compact, trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout.

Ejemplo: 'Holdout',0.1

Tipos de datos: double | single

Number of folds to use in a cross-validated model, specified as the comma-separated pair consisting of 'KFold' and a positive integer value greater than 1. If you specify 'KFold',k, then the software completes these steps.

  1. Randomly partition the data into k sets.

  2. For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

  3. Store the k compact, trained models in the cells of a k-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout.

Ejemplo: 'KFold',5

Tipos de datos: single | double

Leave-one-out cross-validation flag, specified as the comma-separated pair consisting of 'Leaveout' and 'on' or 'off'. If you specify 'Leaveout','on', then, for each of the n observations (where n is the number of observations excluding missing observations, specified in the NumObservations property of the model), the software completes these steps:

  1. Reserve the observation as validation data, and train the model using the other n – 1 observations.

  2. Store the n compact, trained models in the cells of an n-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout.

Ejemplo: 'Leaveout','on'

Estimation options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by statset.

To invoke parallel computing:

  • You need a Parallel Computing Toolbox™ license.

  • Specify 'Options',statset('UseParallel',1).

Output Arguments

expandir todo

Cross-validated ECOC model, returned as a ClassificationPartitionedECOC model.

Ejemplos

expandir todo

Load Fisher's iris data set. Specify the predictor data X and the response data Y.

load fisheriris
X = meas;
Y = species;
rng(1); % For reproducibility

Create an SVM template, and standardize the predictors.

t = templateSVM('Standardize',1)
t = 
Fit template for classification SVM.

                     Alpha: [0x1 double]
             BoxConstraint: []
                 CacheSize: []
             CachingMethod: ''
                ClipAlphas: []
    DeltaGradientTolerance: []
                   Epsilon: []
              GapTolerance: []
              KKTTolerance: []
            IterationLimit: []
            KernelFunction: ''
               KernelScale: []
              KernelOffset: []
     KernelPolynomialOrder: []
                  NumPrint: []
                        Nu: []
           OutlierFraction: []
          RemoveDuplicates: []
           ShrinkagePeriod: []
                    Solver: ''
           StandardizeData: 1
        SaveSupportVectors: []
            VerbosityLevel: []
                   Version: 2
                    Method: 'SVM'
                      Type: 'classification'

t is an SVM template. Most of the template object's properties are empty. When training the ECOC classifier, the software sets the applicable properties to their default values.

Train the ECOC classifier, and specify the class order.

Mdl = fitcecoc(X,Y,'Learners',t,...
    'ClassNames',{'setosa','versicolor','virginica'});

Mdl is a ClassificationECOC classifier. You can access its properties using dot notation.

Cross-validate Mdl using 10-fold cross-validation.

CVMdl = crossval(Mdl);

CVMdl is a ClassificationPartitionedECOC cross-validated ECOC classifier.

Estimate the classification error.

loss = kfoldLoss(CVMdl)
loss = 0.0400

The classification error is 4%, which indicates that the ECOC classifier generalizes fairly well.

Consider the arrhythmia data set. There are 16 classes in the study, 13 of which are represented in the data. The first class indicates that the subject did not have arrhythmia, and the last class indicates that the subject's arrhythmia state was not recorded. Suppose that the other classes are ordinal levels indicating the severity of arrhythmia.

Train an ECOC classifier with a custom coding design specified by the description of the classes.

Load the arrhythmia data set.

load arrhythmia
Y = categorical(Y);
K = unique(Y); % Number of distinct classes

Construct a coding matrix that describes the nature of the classes.

OrdMat = designecoc(11,'ordinal');
nOM = size(OrdMat);
class1VSOrd = [1; -ones(11,1); 0];
class1VSClass16 = [1; zeros(11,1); -1];
OrdVSClass16 = [0; ones(11,1); -1];
Coding = [class1VSOrd class1VSClass16 OrdVSClass16,...
    [zeros(1,nOM(2)); OrdMat; zeros(1,nOM(2))]];

Train an ECOC classifier using the custom coding design (Coding) and parallel computing. Specify to use an ensemble of 50 classification trees boosted using GentleBoost.

t = templateEnsemble('GentleBoost',50,'Tree');
options = statset('UseParallel',1);
Mdl = fitcecoc(X,Y,'Coding',Coding,'Learners',t,'Options',options);

Mdl is a ClassificationECOC model. You can access its properties using dot notation.

Cross validate Mdl using 8-fold cross validation and parallel computing.

rng(1); % For reproducibility
CVMdl = crossval(Mdl,'Options',options,'KFold',8);
Warning: One or more folds do not contain points from all the groups.

Since some of the classes have low relative frequency, some of the folds do not train using observations from those classes. CVMdl is a ClassificationPartitionedECOC cross-validated ECOC model.

Estimate the generalization error using parallel computing.

oosLoss = kfoldLoss(CVMdl,'Options',options)
oosLoss =

    0.3208

The out-of-sample classification error is 32%, which indicates that this model does not generalize well. To improve the model, try training using a different boosting method, such as RobustBoost, or a different algorithm altogether, such as SVM.

Sugerencias

Assess the predictive performance of Mdl on cross-validated data using the kfold functions and properties of CVMdl, such as kfoldLoss.

Alternatives

Instead of training an ECOC model and then cross validating it, you can create a cross-validated ECOC model directly using fitcecoc and by specifying any of these name-value pair arguments: CrossVal, CVPartition, Holdout, Leaveout, or KFold.

Capacidades ampliadas