Esta página aún no se ha traducido para esta versión. Puede ver la versión más reciente de esta página en inglés.

fitcecoc

Fit multiclass models for support vector machines or other classifiers

Sintaxis

Mdl = fitcecoc(Tbl,ResponseVarName)
Mdl = fitcecoc(Tbl,formula)
Mdl = fitcecoc(Tbl,Y)
Mdl = fitcecoc(X,Y)
Mdl = fitcecoc(___,Name,Value)
[Mdl,HyperparameterOptimizationResults] = fitcecoc(___,Name,Value)

Descripción

Mdl = fitcecoc(Tbl,ResponseVarName) returns a full, trained, multiclass, error-correcting output codes (ECOC) model using the predictors in table Tbl and the class labels in Tbl.ResponseVarName. fitcecoc uses K(K – 1)/2 binary support vector machine (SVM) models using the one-versus-one coding design, where K is the number of unique class labels (levels). Mdl is a ClassificationECOC model.

Mdl = fitcecoc(Tbl,formula) returns an ECOC model using the predictors in table Tbl and the class labels. formula is an explanatory model of the response and a subset of predictor variables in Tbl used for training.

Mdl = fitcecoc(Tbl,Y) returns an ECOC model using the predictors in table Tbl and the class labels in vector Y.

ejemplo

Mdl = fitcecoc(X,Y) returns a trained ECOC model using the predictors X and the class labels Y.

ejemplo

Mdl = fitcecoc(___,Name,Value) returns an ECOC model with additional options specified by one or more Name,Value pair arguments, using any of the previous syntaxes.

For example, specify different binary learners, a different coding design, or to cross-validate. It is good practice to cross-validate using the Kfold Name,Value pair argument. The cross-validation results determine how well the model generalizes.

[Mdl,HyperparameterOptimizationResults] = fitcecoc(___,Name,Value) also returns hyperparameter optimization details when you pass an OptimizeHyperparameters name-value pair with Learners = 'linear'. For other Learners, the HyperparameterOptimizationResults property of Mdl contains the results. Hyperparameter optimization is not available for kernel binary learners.

Ejemplos

contraer todo

Train an error-correcting output codes (ECOC) multiclass model using support vector machine (SVM) binary learners.

Load Fisher's iris data set.

load fisheriris
X = meas;
Y = species;

Train an ECOC multiclass model using the default options.

Mdl = fitcecoc(X,Y)
Mdl = 
  ClassificationECOC
             ResponseName: 'Y'
    CategoricalPredictors: []
               ClassNames: {'setosa'  'versicolor'  'virginica'}
           ScoreTransform: 'none'
           BinaryLearners: {3x1 cell}
               CodingName: 'onevsone'


  Properties, Methods

Mdl is a ClassificationECOC model. By default, fitcecoc uses SVM binary learners, and uses a one-versus-one coding design. You can access Mdl properties using dot notation.

Display the coding design matrix.

Mdl.ClassNames
ans = 3x1 cell array
    {'setosa'    }
    {'versicolor'}
    {'virginica' }

CodingMat = Mdl.CodingMatrix
CodingMat = 3×3

     1     1     0
    -1     0     1
     0    -1    -1

A one-versus-one coding design on three classes yields three binary learners. Columns of CodingMat correspond to learners, and rows correspond to classes. The class order corresponds to the order in Mdl.ClassNames. For example, CodingMat(:,1) is [1; -1; 0] and indicates that the software trains the first SVM binary learner using all observations classified as 'setosa' and 'versicolor'. Since 'setosa' corresponds to 1, it is the positive class, and since 'versicolor' corresponds to -1, it is the negative class.

You can access each binary learner using cell indexing and dot notation.

Mdl.BinaryLearners{1}                % The first binary learner
ans = 
  classreg.learning.classif.CompactClassificationSVM
             ResponseName: 'Y'
    CategoricalPredictors: []
               ClassNames: [-1 1]
           ScoreTransform: 'none'
                     Beta: [4x1 double]
                     Bias: 1.4505
         KernelParameters: [1x1 struct]


  Properties, Methods

Compute the in-sample classification error.

isLoss = resubLoss(Mdl)
isLoss = 0.0067

The classification error is small, but the classifier might have been overfit. You can cross-validate the classifier using crossval.

Load Fisher's iris data set. Specify the predictor data X and the response data Y.

load fisheriris
X = meas;
Y = species;
rng(1); % For reproducibility

Create an SVM template, and standardize the predictors.

t = templateSVM('Standardize',1)
t = 
Fit template for classification SVM.

                     Alpha: [0x1 double]
             BoxConstraint: []
                 CacheSize: []
             CachingMethod: ''
                ClipAlphas: []
    DeltaGradientTolerance: []
                   Epsilon: []
              GapTolerance: []
              KKTTolerance: []
            IterationLimit: []
            KernelFunction: ''
               KernelScale: []
              KernelOffset: []
     KernelPolynomialOrder: []
                  NumPrint: []
                        Nu: []
           OutlierFraction: []
          RemoveDuplicates: []
           ShrinkagePeriod: []
                    Solver: ''
           StandardizeData: 1
        SaveSupportVectors: []
            VerbosityLevel: []
                   Version: 2
                    Method: 'SVM'
                      Type: 'classification'

t is an SVM template. Most of the template object's properties are empty. When training the ECOC classifier, the software sets the applicable properties to their default values.

Train the ECOC classifier, and specify the class order.

Mdl = fitcecoc(X,Y,'Learners',t,...
    'ClassNames',{'setosa','versicolor','virginica'});

Mdl is a ClassificationECOC classifier. You can access its properties using dot notation.

Cross-validate Mdl using 10-fold cross-validation.

CVMdl = crossval(Mdl);

CVMdl is a ClassificationPartitionedECOC cross-validated ECOC classifier.

Estimate the classification error.

loss = kfoldLoss(CVMdl)
loss = 0.0400

The classification error is 4%, which indicates that the ECOC classifier generalizes fairly well.

Load Fisher's iris data set. Train the classifier using the petal dimensions as predictors.

load fisheriris
X = meas(:,3:4);
Y = species;
rng(1); % For reproducibility

Create an SVM template, and specify the Gaussian kernel. It is good practice to standardize the predictors.

t = templateSVM('Standardize',1,'KernelFunction','gaussian');

t is an SVM template. Most of its properties are empty. When the software trains the ECOC classifier, it sets the applicable properties to their default values.

Train the ECOC classifier using the SVM template. Transform classification scores to class posterior probabilities (which are returned by predict or resubPredict) using the 'FitPosterior' name-value pair argument. Display diagnostic messages during the training using the 'Verbose' name-value pair argument. It is good practice to specify the class order.

Mdl = fitcecoc(X,Y,'Learners',t,'FitPosterior',1,...
    'ClassNames',{'setosa','versicolor','virginica'},...
    'Verbose',2);
Training binary learner 1 (SVM) out of 3 with 50 negative and 50 positive observations.
Negative class indices: 2
Positive class indices: 1

Fitting posterior probabilities for learner 1 (SVM).
Training binary learner 2 (SVM) out of 3 with 50 negative and 50 positive observations.
Negative class indices: 3
Positive class indices: 1

Fitting posterior probabilities for learner 2 (SVM).
Training binary learner 3 (SVM) out of 3 with 50 negative and 50 positive observations.
Negative class indices: 3
Positive class indices: 2

Fitting posterior probabilities for learner 3 (SVM).

Mdl is a ClassificationECOC model. The same SVM template applies to each binary learner, but you can adjust options for each binary learner by passing in a cell vector of templates.

Predict the in-sample labels and class posterior probabilities. Display diagnostic messages during the computation of labels and class posterior probabilities using the 'Verbose' name-value pair argument.

[label,~,~,Posterior] = resubPredict(Mdl,'Verbose',1);
Mdl.BinaryLoss
Predictions from all learners have been computed.
Loss for all observations has been computed.
Computing posterior probabilities...

ans =

    'quadratic'

The software assigns an observation to the class that yields the smallest average binary loss. Since all binary learners are computing posterior probabilities, the binary loss function is quadratic.

Display a random set of results.

idx = randsample(size(X,1),10,1);
Mdl.ClassNames
table(Y(idx),label(idx),Posterior(idx,:),...
    'VariableNames',{'TrueLabel','PredLabel','Posterior'})
ans =

  3x1 cell array

    {'setosa'    }
    {'versicolor'}
    {'virginica' }


ans =

  10x3 table

     TrueLabel       PredLabel                    Posterior               
    ____________    ____________    ______________________________________

    'virginica'     'virginica'      0.0039321     0.0039869       0.99208
    'virginica'     'virginica'       0.017067      0.018263       0.96467
    'virginica'     'virginica'       0.014948      0.015856        0.9692
    'versicolor'    'versicolor'    2.2197e-14       0.87317       0.12683
    'setosa'        'setosa'             0.999    0.00025091    0.00074639
    'versicolor'    'virginica'     2.2195e-14      0.059429       0.94057
    'versicolor'    'versicolor'    2.2194e-14       0.97001      0.029986
    'setosa'        'setosa'             0.999     0.0002499    0.00074741
    'versicolor'    'versicolor'     0.0085646       0.98259      0.008849
    'setosa'        'setosa'             0.999    0.00025013    0.00074718

The columns of Posterior correspond to the class order of Mdl.ClassNames.

Define a grid of values in the observed predictor space. Predict the posterior probabilities for each instance in the grid.

xMax = max(X);
xMin = min(X);

x1Pts = linspace(xMin(1),xMax(1));
x2Pts = linspace(xMin(2),xMax(2));
[x1Grid,x2Grid] = meshgrid(x1Pts,x2Pts);

[~,~,~,PosteriorRegion] = predict(Mdl,[x1Grid(:),x2Grid(:)]);

For each coordinate on the grid, plot the maximum class posterior probability among all classes.

figure;
contourf(x1Grid,x2Grid,...
        reshape(max(PosteriorRegion,[],2),size(x1Grid,1),size(x1Grid,2)));
h = colorbar;
h.YLabel.String = 'Maximum posterior';
h.YLabel.FontSize = 15;
hold on
gh = gscatter(X(:,1),X(:,2),Y,'krk','*xd',8);
gh(2).LineWidth = 2;
gh(3).LineWidth = 2;

title 'Iris Petal Measurements and Maximum Posterior';
xlabel 'Petal length (cm)';
ylabel 'Petal width (cm)';
axis tight
legend(gh,'Location','NorthWest')
hold off

Train an ECOC model composed of multiple binary, linear classification models.

Load the NLP data set.

load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. There are more than two classes in the data.

Create a default linear-classification-model template.

t = templateLinear();

To adjust the default values, see the Name-Value Pair Arguments on templateLinear page.

Train an ECOC model composed of multiple binary, linear classification models that can identify the product given the frequency distribution of words on a documentation web page. For faster training time, transpose the predictor data, and specify that observations correspond to columns.

X = X';
rng(1); % For reproducibility 
Mdl = fitcecoc(X,Y,'Learners',t,'ObservationsIn','columns')
Mdl = 
  classreg.learning.classif.CompactClassificationECOC
      ResponseName: 'Y'
        ClassNames: [1x13 categorical]
    ScoreTransform: 'none'
    BinaryLearners: {78x1 cell}
      CodingMatrix: [13x78 double]


  Properties, Methods

Alternatively, you can train an ECOC model composed of default linear classification models using 'Learners','Linear'.

To conserve memory, fitcecoc returns trained ECOC models composed of linear classification learners in CompactClassificationECOC model objects.

Train a one-versus-all ECOC classifier using a GentleBoost ensemble of decision trees with surrogate splits. Estimate the classification error using 10-fold cross-validation.

Load and inspect the arrhythmia data set.

load arrhythmia
[n,p] = size(X)
n = 452
p = 279
isLabels = unique(Y);
nLabels = numel(isLabels)
nLabels = 13
tabulate(categorical(Y))
  Value    Count   Percent
      1      245     54.20%
      2       44      9.73%
      3       15      3.32%
      4       15      3.32%
      5       13      2.88%
      6       25      5.53%
      7        3      0.66%
      8        2      0.44%
      9        9      1.99%
     10       50     11.06%
     14        4      0.88%
     15        5      1.11%
     16       22      4.87%

The data set contains 279 predictors, and the sample size of 452 is relatively small. Of the 16 distinct labels, only 13 are represented in the response (Y). Each label describes various degrees of arrhythmia, and 54.20% of the observations are in class 1.

Create an ensemble template. You must specify at least three arguments: a method, a number of learners, and the type of learner. For this example, specify 'GentleBoost' for the method, 100 for the number of learners, and a decision tree template that uses surrogate splits because there are missing observations.

tTree = templateTree('surrogate','on');
tEnsemble = templateEnsemble('GentleBoost',100,tTree);

tEnsemble is a template object. Most of its properties are empty, but the software fills them with their default values during training.

Train a one-versus-all ECOC classifier using the ensembles of decision trees as binary learners. With a Parallel Computing Toolbox license, you can speed up the computation by using parallel computing, which sends each binary learner to a worker in the pool. (The number of workers depends on your system configuration.) Additionally, specify that the prior probabilities are 1/K, where K = 13 is the number of distinct classes.

pool = parpool; % Invoke workers
Starting parallel pool (parpool) using the 'local' profile ...
connected to 6 workers.
options = statset('UseParallel',true);
Mdl = fitcecoc(X,Y,'Coding','onevsall','Learners',tEnsemble,...
                'Prior','uniform','Options',options);

Mdl is a ClassificationECOC model.

Cross-validate the ECOC classifier using 10-fold cross-validation.

CVMdl = crossval(Mdl,'Options',options);
Warning: One or more folds do not contain points from all the groups.

CVMdl is a ClassificationPartitionedECOC model. The warning indicates that some classes are not represented while the software trains at least one fold. Therefore, those folds cannot predict labels for the missing classes. You can inspect the results of a fold using cell indexing and dot notation. For example, access the results of the first fold by entering CVMdl.Trained{1}.

Use the cross-validated ECOC classifier to predict validation-fold labels. You can compute the confusion matrix by using confusionchart. Move and resize the chart by changing the inner position property to ensure that the percentages appear in the row summary.

oofLabel = kfoldPredict(CVMdl,'Options',options);
ConfMat = confusionchart(Y,oofLabel,'RowSummary','total-normalized');
ConfMat.InnerPosition = [0.10 0.12 0.85 0.85];

This example shows how to optimize hyperparameters automatically using fitcecoc. The example uses Fisher's iris data.

Load the data.

load fisheriris
X = meas;
Y = species;

Find hyperparameters that minimize five-fold cross-validation loss by using automatic hyperparameter optimization.

For reproducibility, set the random seed and use the 'expected-improvement-plus' acquisition function.

rng default
Mdl = fitcecoc(X,Y,'OptimizeHyperparameters','auto',...
    'HyperparameterOptimizationOptions',struct('AcquisitionFunctionName',...
    'expected-improvement-plus'))

|====================================================================================================================|
| Iter | Eval   | Objective   | Objective   | BestSoFar   | BestSoFar   |       Coding | BoxConstraint|  KernelScale |
|      | result |             | runtime     | (observed)  | (estim.)    |              |              |              |
|====================================================================================================================|
|    1 | Best   |     0.10667 |      3.6028 |     0.10667 |     0.10667 |     onevsone |       5.6939 |       200.36 |
|    2 | Best   |        0.08 |      10.577 |        0.08 |    0.081379 |     onevsone |       94.849 |    0.0032549 |
|    3 | Accept |        0.08 |        1.09 |        0.08 |     0.08003 |     onevsall |      0.01378 |     0.076021 |
|    4 | Accept |        0.08 |      0.6157 |        0.08 |    0.080001 |     onevsall |          889 |       38.798 |
|    5 | Best   |    0.073333 |     0.99526 |    0.073333 |    0.073337 |     onevsall |       17.142 |       1.7174 |
|    6 | Accept |        0.38 |      28.866 |    0.073333 |    0.073338 |     onevsall |      0.88995 |    0.0010029 |
|    7 | Best   |    0.046667 |     0.47562 |    0.046667 |    0.046688 |     onevsall |        4.246 |       0.3356 |
|    8 | Best   |    0.033333 |     0.84864 |    0.033333 |    0.033341 |     onevsone |      0.22406 |      0.37399 |
|    9 | Best   |    0.026667 |     0.43498 |    0.026667 |    0.026678 |     onevsone |       14.237 |       3.5166 |
|   10 | Accept |     0.33333 |     0.49517 |    0.026667 |    0.026676 |     onevsall |    0.0064689 |       999.31 |
|   11 | Accept |        0.04 |     0.53707 |    0.026667 |      0.0268 |     onevsone |        982.5 |      0.51146 |
|   12 | Accept |    0.046667 |     0.55849 |    0.026667 |    0.026694 |     onevsone |     0.018266 |     0.047347 |
|   13 | Accept |     0.10667 |      1.0967 |    0.026667 |    0.029124 |     onevsone |    0.0010243 |       13.372 |
|   14 | Accept |        0.04 |      1.2937 |    0.026667 |    0.032336 |     onevsone |       156.11 |       1.7366 |
|   15 | Accept |    0.046667 |      1.4345 |    0.026667 |      0.0327 |     onevsone |       986.23 |       10.731 |
|   16 | Accept |    0.046667 |      2.9826 |    0.026667 |    0.032045 |     onevsone |       371.63 |     0.056453 |
|   17 | Accept |        0.04 |      1.0206 |    0.026667 |    0.033569 |     onevsone |    0.0010311 |    0.0010175 |
|   18 | Accept |    0.046667 |     0.95108 |    0.026667 |    0.034256 |     onevsone |    0.0011574 |      0.16436 |
|   19 | Accept |        0.06 |      25.552 |    0.026667 |    0.032699 |     onevsall |       968.86 |       0.2494 |
|   20 | Accept |        0.04 |     0.64111 |    0.026667 |    0.031457 |     onevsone |       985.47 |       2.8942 |
|====================================================================================================================|
| Iter | Eval   | Objective   | Objective   | BestSoFar   | BestSoFar   |       Coding | BoxConstraint|  KernelScale |
|      | result |             | runtime     | (observed)  | (estim.)    |              |              |              |
|====================================================================================================================|
|   21 | Accept |        0.04 |     0.70606 |    0.026667 |     0.03134 |     onevsone |     0.001037 |    0.0044045 |
|   22 | Best   |        0.02 |     0.62679 |        0.02 |    0.023771 |     onevsone |       1.9507 |       1.3991 |
|   23 | Best   |    0.013333 |      1.0619 |    0.013333 |    0.018605 |     onevsone |      0.84926 |       1.3538 |
|   24 | Accept |    0.026667 |      1.1226 |    0.013333 |    0.021089 |     onevsone |       0.2101 |       1.5222 |
|   25 | Accept |    0.026667 |     0.61465 |    0.013333 |    0.022321 |     onevsone |       1.7108 |       1.2127 |
|   26 | Accept |     0.10667 |     0.71357 |    0.013333 |    0.022359 |     onevsone |    0.0010149 |       986.98 |
|   27 | Accept |     0.33333 |     0.67893 |    0.013333 |    0.021789 |     onevsall |    0.0010002 |        21.18 |
|   28 | Accept |    0.013333 |     0.72433 |    0.013333 |    0.019873 |     onevsone |       1.5298 |       1.6373 |
|   29 | Accept |        0.02 |     0.92909 |    0.013333 |    0.019708 |     onevsone |       1.2119 |       1.9178 |
|   30 | Accept |     0.33333 |     0.51972 |    0.013333 |    0.019544 |     onevsall |       940.08 |       979.72 |

__________________________________________________________
Optimization completed.
MaxObjectiveEvaluations of 30 reached.
Total function evaluations: 30
Total elapsed time: 187.1266 seconds.
Total objective function evaluation time: 91.7671

Best observed feasible point:
     Coding     BoxConstraint    KernelScale
    ________    _____________    ___________

    onevsone       0.84926         1.3538   

Observed objective function value = 0.013333
Estimated objective function value = 0.019544
Function evaluation time = 1.0619

Best estimated feasible point (according to models):
     Coding     BoxConstraint    KernelScale
    ________    _____________    ___________

    onevsone       1.5298          1.6373   

Estimated objective function value = 0.019544
Estimated function evaluation time = 0.74034
Mdl = 
  ClassificationECOC
                         ResponseName: 'Y'
                CategoricalPredictors: []
                           ClassNames: {'setosa'  'versicolor'  'virginica'}
                       ScoreTransform: 'none'
                       BinaryLearners: {3x1 cell}
                           CodingName: 'onevsone'
    HyperparameterOptimizationResults: [1x1 BayesianOptimization]


  Properties, Methods

Create two multiclass ECOC models trained on tall data. Use linear binary learners for one of the models and kernel binary learners for the other. Compare the resubstitution classification error of the two models.

In general, you can perform multiclass classification of tall data by using fitcecoc with linear or kernel binary learners. When you use fitcecoc to train a model on tall arrays, you cannot use SVM binary learners directly. However, you can use either linear or kernel binary classification models that use SVMs.

Create a datastore that references the folder containing Fisher's iris data set. Specify 'NA' values as missing data so that datastore replaces them with NaN values. Create tall versions of the predictor and response data.

ds = datastore('fisheriris.csv','TreatAsMissing','NA');
t = tall(ds);
Starting parallel pool (parpool) using the 'local' profile ...
connected to 6 workers.
X = [t.SepalLength t.SepalWidth t.PetalLength t.PetalWidth];
Y = t.Species;

Standardize the predictor data.

Z = zscore(X);

Train a multiclass ECOC model that uses tall data and linear binary learners. By default, when you pass tall arrays to fitcecoc, the software trains linear binary learners that use SVMs. Because the response data contains only three unique classes, change the coding scheme from one-versus-all (which is the default when you use tall data) to one-versus-one (which is the default when you use in-memory data).

rng('default') % For reproducibility
mdlLinear = fitcecoc(Z,Y,'Coding','onevsone')
Training binary learner 1 (Linear) out of 3.
Training binary learner 2 (Linear) out of 3.
Training binary learner 3 (Linear) out of 3.
mdlLinear = 
  classreg.learning.classif.CompactClassificationECOC
      ResponseName: 'Y'
        ClassNames: {'setosa'  'versicolor'  'virginica'}
    ScoreTransform: 'none'
    BinaryLearners: {3×1 cell}
      CodingMatrix: [3×3 double]


  Properties, Methods

mdlLinear is a CompactClassificationECOC model composed of three binary learners.

Train a multiclass ECOC model that uses tall data and kernel binary learners. First, create a templateKernel object to specify the properties of the kernel binary learners; in particular, increase the number of expansion dimensions to .

tKernel = templateKernel('NumExpansionDimensions',2^16)
tKernel = 
Fit template for classification Kernel.

             BetaTolerance: []
                 BlockSize: []
             BoxConstraint: []
                   Epsilon: []
    NumExpansionDimensions: 65536
         GradientTolerance: []
        HessianHistorySize: []
            IterationLimit: []
               KernelScale: []
                    Lambda: []
                   Learner: 'svm'
              LossFunction: []
                    Stream: []
            VerbosityLevel: []
                   Version: 1
                    Method: 'Kernel'
                      Type: 'classification'

By default, the kernel binary learners use SVMs.

Pass the templateKernel object to fitcecoc and change the coding scheme to one-versus-one.

mdlKernel = fitcecoc(Z,Y,'Learners',tKernel,'Coding','onevsone')
Training binary learner 1 (Kernel) out of 3.
Training binary learner 2 (Kernel) out of 3.
Training binary learner 3 (Kernel) out of 3.
mdlKernel = 
  classreg.learning.classif.CompactClassificationECOC
      ResponseName: 'Y'
        ClassNames: {'setosa'  'versicolor'  'virginica'}
    ScoreTransform: 'none'
    BinaryLearners: {3×1 cell}
      CodingMatrix: [3×3 double]


  Properties, Methods

mdlKernel is also a CompactClassificationECOC model composed of three binary learners.

Compare the resubstitution classification error of the two models.

errorLinear = gather(loss(mdlLinear,Z,Y))
Evaluating tall expression using the Parallel Pool 'local':
- Pass 1 of 1: Completed in 1.7 sec
Evaluation completed in 2 sec
errorLinear = 0.0333
errorKernel = gather(loss(mdlKernel,Z,Y))
Evaluating tall expression using the Parallel Pool 'local':
- Pass 1 of 1: Completed in 17 sec
Evaluation completed in 17 sec
errorKernel = 0.0067

mdlKernel misclassifies a smaller percentage of the training data than mdlLinear.

Argumentos de entrada

contraer todo

Sample data, specified as a table. Each row of Tbl corresponds to one observation, and each column corresponds to one predictor. Optionally, Tbl can contain one additional column for the response variable. Multicolumn variables and cell arrays other than cell arrays of character vectors are not accepted.

If Tbl contains the response variable, and you want to use all remaining variables in Tbl as predictors, then specify the response variable using ResponseVarName.

If Tbl contains the response variable, and you want to use only a subset of the remaining variables in Tbl as predictors, specify a formula using formula.

If Tbl does not contain the response variable, specify a response variable using Y. The length of response variable and the number of Tbl rows must be equal.

Nota

For training linear or kernel classification models, fitcecoc does not support tables. That is, if Learners is 'linear' or 'kernel', contains a linear classification model learner template (see templateLinear), or contains a kernel classification learner template (see templateKernel), you cannot supply Tbl, ResponseVarName, or formula. Supply a matrix of predictor data (X) and an array of responses (Y) instead.

Tipos de datos: table

Response variable name, specified as the name of a variable in Tbl.

You must specify ResponseVarName as a character vector or string scalar. For example, if the response variable Y is stored as Tbl.Y, then specify it as 'Y'. Otherwise, the software treats all columns of Tbl, including Y, as predictors when training the model.

The response variable must be a categorical, character, or string array, logical or numeric vector, or cell array of character vectors. If Y is a character array, then each element of the response variable must correspond to one row of the array.

It is a good practice to specify the order of the classes by using the ClassNames name-value pair argument.

Tipos de datos: char | string

Explanatory model of the response and a subset of the predictor variables, specified as a character vector or string scalar in the form of 'Y~X1+X2+X3'. In this form, Y represents the response variable, and X1, X2, and X3 represent the predictor variables. The variables must be variable names in Tbl (Tbl.Properties.VariableNames).

To specify a subset of variables in Tbl as predictors for training the model, use a formula. If you specify a formula, then the software does not use any variables in Tbl that do not appear in formula.

Tipos de datos: char | string

Class labels to which the ECOC model is trained, specified as a categorical, character, or string array, logical or numeric vector, or cell array of character vectors.

If Y is a character array, then each element must correspond to one row of the array.

The length of Y and the number of rows of Tbl or X must be equal.

It is good practice to specify the class order using the ClassNames name-value pair argument.

Tipos de datos: categorical | char | string | logical | single | double | cell

Predictor data, specified as a full or sparse matrix.

The length of Y and the number of observations in X must be equal.

To specify the names of the predictors in the order of their appearance in X, use the PredictorNames name-value pair argument.

Nota

  • For linear classification learners, if you orient X so that observations correspond to columns and specify 'ObservationsIn','columns', then you can experience a significant reduction in optimization-execution time.

  • For all other learners, orient X so that observations correspond to rows.

  • fitcecoc supports sparse matrices for training linear classification models only.

Tipos de datos: double | single

Nota

The software treats NaN, empty character vector (''), empty string (""), <missing>, and <undefined> elements as missing data. The software removes rows of X corresponding to missing values in Y. However, the treatment of missing values in X varies among binary learners. For details, see the training functions for your binary learners: fitcdiscr, fitckernel, fitcknn, fitclinear, fitcnb, fitcsvm, fitctree, or fitcensemble. Removing observations decreases the effective training or cross-validation sample size.

Argumentos de par nombre-valor

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Ejemplo: 'Learners','tree','Coding','onevsone','CrossVal','on' specifies to use decision trees for all binary learners, a one-versus-one coding design, and to implement 10-fold cross-validation.

Nota

You cannot use any cross-validation name-value pair argument along with the 'OptimizeHyperparameters' name-value pair argument. You can modify the cross-validation for 'OptimizeHyperparameters' only by using the 'HyperparameterOptimizationOptions' name-value pair argument.

ECOC Classifier Options

contraer todo

Coding design name, specified as the comma-separated pair consisting of 'Coding' and a numeric matrix or a value in this table.

ValueNumber of Binary LearnersDescription
'allpairs' and 'onevsone'K(K – 1)/2For each binary learner, one class is positive, another is negative, and the software ignores the rest. This design exhausts all combinations of class pair assignments.
'binarycomplete'2(K1)1This design partitions the classes into all binary combinations, and does not ignore any classes. For each binary learner, all class assignments are -1 and 1 with at least one positive and negative class in the assignment.
'denserandom'Random, but approximately 10 log2KFor each binary learner, the software randomly assigns classes into positive or negative classes, with at least one of each type. For more details, see Random Coding Design Matrices.
'onevsall'KFor each binary learner, one class is positive and the rest are negative. This design exhausts all combinations of positive class assignments.
'ordinal'K – 1For the first binary learner, the first class is negative, and the rest positive. For the second binary learner, the first two classes are negative, the rest positive, and so on.
'sparserandom'Random, but approximately 15 log2KFor each binary learner, the software randomly assigns classes as positive or negative with probability 0.25 for each, and ignores classes with probability 0.5. For more details, see Random Coding Design Matrices.
'ternarycomplete'(3K2(K+1)+1)/2This design partitions the classes into all ternary combinations. All class assignments are 0, -1, and 1 with at least one positive and one negative class in the assignment.

You can also specify a coding design using a custom coding matrix. The custom coding matrix is a K-by-L matrix. Each row corresponds to a class and each column corresponds to a binary learner. The class order (rows) corresponds to the order in ClassNames. Compose the matrix by following these guidelines:

  • Every element of the custom coding matrix must be -1, 0, or 1, and the value must correspond to a dichotomous class assignment. This table describes the meaning of Coding(i,j), that is, the class that learner j assigns to observations in class i.

    ValueDichotomous Class Assignment
    –1Learner j assigns observations in class i to a negative class.
    0Before training, learner j removes observations in class i from the data set.
    1Learner j assigns observations in class i to a positive class.

  • Every column must contain at least one -1 or 1.

  • For all column indices i,j such that ij, Coding(:,i) cannot equal Coding(:,j) and Coding(:,i) cannot equal -Coding(:,j).

  • All rows of the custom coding matrix must be different.

For more details on the form of custom coding design matrices, see Custom Coding Design Matrices.

Ejemplo: 'Coding','ternarycomplete'

Tipos de datos: char | string | double | single | int16 | int32 | int64 | int8

Flag indicating whether to transform scores to posterior probabilities, specified as the comma-separated pair consisting of 'FitPosterior' and a true (1) or false (0).

If FitPosterior is true, then the software transforms binary-learner classification scores to posterior probabilities. You can obtain posterior probabilities by using kfoldPredict, predict, or resubPredict.

fitcecoc does not support fitting posterior probabilities if:

  • The ensemble method is AdaBoostM2, LPBoost, RUSBoost, RobustBoost, or TotalBoost.

  • The binary learners (Learners) are linear or kernel classification models that implement SVM. To obtain posterior probabilities for linear or kernel classification models, implement logistic regression instead.

Ejemplo: 'FitPosterior',true

Tipos de datos: logical

Binary learner templates, specified as the comma-separated pair consisting of 'Learners' and a character vector, string scalar, template object, or cell vector of template objects. Specifically, you can specify binary classifiers such as SVM, and the ensembles that use GentleBoost, LogitBoost, and RobustBoost, to solve multiclass problems. However, fitcecoc also supports multiclass models as binary classifiers.

  • If Learners is a character vector or string scalar, then the software trains each binary learner using the default values of the specified algorithm. This table summarizes the available algorithms.

    ValueDescription
    'discriminant'Discriminant analysis. For default options, see templateDiscriminant.
    'kernel'Kernel classification model. For default options, see templateKernel.
    'knn'k-nearest neighbors. For default options, see templateKNN.
    'linear'Linear classification model. For default options, see templateLinear.
    'naivebayes'Naive Bayes. For default options, see templateNaiveBayes.
    'svm'SVM. For default options, see templateSVM.
    'tree'Classification trees. For default options, see templateTree.

  • If Learners is a template object, then each binary learner trains according to the stored options. You can create a template object using:

  • If Learners is a cell vector of template objects, then:

    • Cell j corresponds to binary learner j (in other words, column j of the coding design matrix), and the cell vector must have length L. L is the number of columns in the coding design matrix. For details, see Coding.

    • To use one of the built-in loss functions for prediction, then all binary learners must return a score in the same range. For example, you cannot include default SVM binary learners with default naive Bayes binary learners. The former returns a score in the range (-∞,∞), and the latter returns a posterior probability as a score. Otherwise, you must provide a custom loss as a function handle to functions such as predict and loss.

    • You cannot specify linear classification model learner templates with any other template.

    • Similarly, you cannot specify kernel classification model learner templates with any other template.

By default, the software trains learners using default SVM templates.

Ejemplo: 'Learners','tree'

Number of binary learners concurrently trained, specified as the comma-separated pair consisting of 'NumConcurrent' and a positive integer scalar. The default value is 1, which means fitcecoc trains the binary learners sequentially.

Nota

This option applies only when you use fitcecoc on tall arrays. See Tall Arrays for more information.

Tipos de datos: single | double

Predictor data observation dimension, specified as the comma-separated pair consisting of 'ObservationsIn' and 'columns' or 'rows'.

Nota

  • For linear classification learners, if you orient X so that observations correspond to columns and specify 'ObservationsIn','columns', then you can experience a significant reduction in optimization-execution time.

  • For all other learners, orient X so that observations correspond to rows.

Ejemplo: 'ObservationsIn','columns'

Verbosity level, specified as the comma-separated pair consisting of 'Verbose' and 0, 1, or 2. Verbose controls the amount of diagnostic information per binary learner that the software displays in the Command Window.

This table summarizes the available verbosity level options.

ValueDescription
0The software does not display diagnostic information.
1The software displays diagnostic messages every time it trains a new binary learner.
2The software displays extra diagnostic messages every time it trains a new binary learner.

Each binary learner has its own verbosity level that is independent of this name-value pair argument. To change the verbosity level of a binary learner, create a template object and specify the 'Verbose' name-value pair argument. Then, pass the template object to fitcecoc by using the 'Learners' name-value pair argument.

Ejemplo: 'Verbose',1

Tipos de datos: double | single

Cross-Validation Options

contraer todo

Flag to train a cross-validated classifier, specified as the comma-separated pair consisting of 'Crossval' and 'on' or 'off'.

If you specify 'on', then the software trains a cross-validated classifier with 10 folds.

You can override this cross-validation setting using one of the CVPartition, Holdout, KFold, or Leaveout name-value pair arguments. You can only use one cross-validation name-value pair argument at a time to create a cross-validated model.

Alternatively, cross-validate later by passing Mdl to crossval.

Ejemplo: 'Crossval','on'

Cross-validation partition, specified as the comma-separated pair consisting of 'CVPartition' and a cvpartition partition object as created by cvpartition. The partition object specifies the type of cross-validation and the indexing for the training and validation sets.

To create a cross-validated model, you can use one of these four name-value pair arguments only: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'.

Ejemplo: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,'KFold',5). Then, you can specify the cross-validated model by using 'CVPartition',cvp.

Fraction of the data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1). If you specify 'Holdout',p, then the software completes these steps:

  1. Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

  2. Store the compact, trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout.

Ejemplo: 'Holdout',0.1

Tipos de datos: double | single

Number of folds to use in a cross-validated model, specified as the comma-separated pair consisting of 'KFold' and a positive integer value greater than 1. If you specify 'KFold',k, then the software completes these steps.

  1. Randomly partition the data into k sets.

  2. For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

  3. Store the k compact, trained models in the cells of a k-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout.

Ejemplo: 'KFold',5

Tipos de datos: single | double

Leave-one-out cross-validation flag, specified as the comma-separated pair consisting of 'Leaveout' and 'on' or 'off'. If you specify 'Leaveout','on', then, for each of the n observations, where n is size(Mdl.X,1), the software:

  1. Reserves the observation as validation data, and trains the model using the other n – 1 observations

  2. Stores the n compact, trained models in the cells of a n-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four options only: CVPartition, Holdout, KFold, or Leaveout.

Nota

Leave-one-out is not recommended for cross-validating ECOC models composed of linear or kernel classification model learners.

Ejemplo: 'Leaveout','on'

Other Classification Options

contraer todo

Categorical predictors list, specified as the comma-separated pair consisting of 'CategoricalPredictors' and one of the values in this table.

ValueDescription
Vector of positive integersAn entry in the vector is the index value corresponding to the column of the predictor data (X or Tbl) that contains a categorical variable.
Logical vectorA true entry means that the corresponding column of predictor data (X or Tbl) is a categorical variable.
Character matrixEach row of the matrix is the name of a predictor variable. The names must match the entries in PredictorNames. Pad the names with extra blanks so each row of the character matrix has the same length.
String array or cell array of character vectorsEach element in the array is the name of a predictor variable. The names must match the entries in PredictorNames.
'all'All predictors are categorical.

Specification of CategoricalPredictors is appropriate if:

  • At least one predictor is categorical and all binary learners are classification trees, naive Bayes learners, SVM, or ensembles of classification trees.

  • All predictors are categorical and at least one binary learner is kNN.

If you specify CategoricalPredictors for any other learner, then the software warns that it cannot train that binary learner. For example, the software cannot train linear or kernel classification model learners using categorical predictors.

By default, if the predictor data is in a table (Tbl), fitcecoc assumes that a variable is categorical if it contains logical values, categorical values, a string array, or a cell array of character vectors. If the predictor data is a matrix (X), fitcecoc assumes all predictors are continuous. To identify any categorical predictors when the data is a matrix, use the 'CategoricalPredictors' name-value pair argument.

Ejemplo: 'CategoricalPredictors','all'

Tipos de datos: single | double | logical | char | string | cell

Names of classes to use for training, specified as the comma-separated pair consisting of 'ClassNames' and a categorical, character, or string array, a logical or numeric vector, or a cell array of character vectors. ClassNames must have the same data type as Y.

If ClassNames is a character array, then each element must correspond to one row of the array.

Use ClassNames to:

  • Order the classes during training.

  • Specify the order of any input or output argument dimension that corresponds to the class order. For example, use ClassNames to specify the order of the dimensions of Cost or the column order of classification scores returned by predict.

  • Select a subset of classes for training. For example, suppose that the set of all distinct class names in Y is {'a','b','c'}. To train the model using observations from classes 'a' and 'c' only, specify 'ClassNames',{'a','c'}.

The default value for ClassNames is the set of all distinct class names in Y.

Ejemplo: 'ClassNames',{'b','g'}

Tipos de datos: categorical | char | string | logical | single | double | cell

Misclassification cost, specified as the comma-separated pair consisting of 'Cost' and a square matrix or structure. If you specify:

  • The square matrix Cost, then Cost(i,j) is the cost of classifying a point into class j if its true class is i. That is, the rows correspond to the true class and the columns correspond to the predicted class. To specify the class order for the corresponding rows and columns of Cost, additionally specify the ClassNames name-value pair argument.

  • The structure S, then it must have two fields:

    • S.ClassNames, which contains the class names as a variable of the same data type as Y

    • S.ClassificationCosts, which contains the cost matrix with rows and columns ordered as in S.ClassNames

The default is ones(K) - eye(K), where K is the number of distinct classes.

Ejemplo: 'Cost',[0 1 2 ; 1 0 2; 2 2 0]

Tipos de datos: double | single | struct

Parallel computing options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by statset. These options require Parallel Computing Toolbox™. fitcecoc uses 'Streams', 'UseParallel', and 'UseSubtreams' fields.

This table summarizes the available options.

OptionDescription
'Streams'

A RandStream object or cell array of such objects. If you do not specify Streams, the software uses the default stream or streams. If you specify Streams, use a single object except when the following are true:

  • You have an open parallel pool.

  • UseParallel is true.

  • UseSubstreams is false.

In that case, use a cell array of the same size as the parallel pool. If a parallel pool is not open, then the software tries to open one (depending on your preferences), and Streams must supply a single random number stream.

'UseParallel'If you have Parallel Computing Toolbox, then you can invoke a pool of workers by setting 'UseParallel',true.
'UseSubstreams'Set to true to compute in parallel using the stream specified by 'Streams'. Default is false. For example, set Streams to a type allowing substreams, such as'mlfg6331_64' or 'mrg32k3a'.

A best practice to ensure more predictable results is to use parpool and explicitly create a parallel pool before you invoke parallel computing using fitcecoc.

Ejemplo: 'Options',statset('UseParallel',true)

Tipos de datos: struct

Predictor variable names, specified as the comma-separated pair consisting of 'PredictorNames' and a string array of unique names or cell array of unique character vectors. The functionality of 'PredictorNames' depends on the way you supply the training data.

  • If you supply X and Y, then you can use 'PredictorNames' to give the predictor variables in X names.

    • The order of the names in PredictorNames must correspond to the column order of X. That is, PredictorNames{1} is the name of X(:,1), PredictorNames{2} is the name of X(:,2), and so on. Also, size(X,2) and numel(PredictorNames) must be equal.

    • By default, PredictorNames is {'x1','x2',...}.

  • If you supply Tbl, then you can use 'PredictorNames' to choose which predictor variables to use in training. That is, fitcecoc uses only the predictor variables in PredictorNames and the response variable in training.

    • PredictorNames must be a subset of Tbl.Properties.VariableNames and cannot include the name of the response variable.

    • By default, PredictorNames contains the names of all predictor variables.

    • It is a good practice to specify the predictors for training using either 'PredictorNames' or formula only.

Ejemplo: 'PredictorNames',{'SepalLength','SepalWidth','PetalLength','PetalWidth'}

Tipos de datos: string | cell

Prior probabilities for each class, specified as the comma-separated pair consisting of 'Prior' and a value in this table.

ValueDescription
'empirical'The class prior probabilities are the class relative frequencies in Y.
'uniform'All class prior probabilities are equal to 1/K, where K is the number of classes.
numeric vectorEach element is a class prior probability. Order the elements according to Mdl.ClassNames or specify the order using the ClassNames name-value pair argument. The software normalizes the elements such that they sum to 1.
structure

A structure S with two fields:

  • S.ClassNames contains the class names as a variable of the same type as Y.

  • S.ClassProbs contains a vector of corresponding prior probabilities. The software normalizes the elements such that they sum to 1.

For more details on how the software incorporates class prior probabilities, see Prior Probabilities and Cost.

Ejemplo: struct('ClassNames',{{'setosa','versicolor','virginica'}},'ClassProbs',1:3)

Tipos de datos: single | double | char | string | struct

Response variable name, specified as the comma-separated pair consisting of 'ResponseName' and a character vector or string scalar.

  • If you supply Y, then you can use 'ResponseName' to specify a name for the response variable.

  • If you supply ResponseVarName or formula, then you cannot use 'ResponseName'.

Ejemplo: 'ResponseName','response'

Tipos de datos: char | string

Score transformation, specified as the comma-separated pair consisting of 'ScoreTransform' and a character vector, string scalar, or function handle.

This table summarizes the available character vectors and string scalars.

ValueDescription
'doublelogit'1/(1 + e–2x)
'invlogit'log(x / (1–x))
'ismax'Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0
'logit'1/(1 + ex)
'none' or 'identity'x (no transformation)
'sign'–1 for x < 0
0 for x = 0
1 for x > 0
'symmetric'2x – 1
'symmetricismax'Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1
'symmetriclogit'2/(1 + ex) – 1

For a MATLAB® function or a function you define, use its function handle for score transform. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).

Ejemplo: 'ScoreTransform','logit'

Tipos de datos: char | string | function_handle

Observation weights, specified as the comma-separated pair consisting of 'Weights' and a numeric vector of positive values or name of a variable in Tbl. The software weighs the observations in each row of X or Tbl with the corresponding value in Weights. The size of Weights must equal the number of rows of X or Tbl.

If you specify the input data as a table Tbl, then Weights can be the name of a variable in Tbl that contains a numeric vector. In this case, you must specify Weights as a character vector or string scalar. For example, if the weights vector W is stored as Tbl.W, then specify it as 'W'. Otherwise, the software treats all columns of Tbl, including W, as predictors or the response when training the model.

The software normalizes Weights to sum up to the value of the prior probability in the respective class.

By default, Weights is ones(n,1), where n is the number of observations in X or Tbl.

Tipos de datos: double | single | char | string

Hyperparameter Optimization

contraer todo

Parameters to optimize, specified as the comma-separated pair consisting of 'OptimizeHyperparameters' and one of the following:

  • 'none' — Do not optimize.

  • 'auto' — Use {'Coding'} along with the default parameters for the specified Learners:

    • Learners = 'svm' (default) — {'BoxConstraint','KernelScale'}

    • Learners = 'discriminant'{'Delta','Gamma'}

    • Learners = 'knn'{'Distance','NumNeighbors'}

    • Learners = 'linear'{'Lambda','Learner'}

    • Learners = 'naivebayes'{'DistributionNames','Width'}

    • Learners = 'tree'{'MinLeafSize'}

  • 'all' — Optimize all eligible parameters.

  • String array or cell array of eligible parameter names

  • Vector of optimizableVariable objects, typically the output of hyperparameters

The optimization attempts to minimize the cross-validation loss (error) for fitcecoc by varying the parameters. For information about cross-validation loss in a different context, see Classification Loss. To control the cross-validation type and other aspects of the optimization, use the HyperparameterOptimizationOptions name-value pair.

Nota

'OptimizeHyperparameters' values override any values you set using other name-value pair arguments. For example, setting 'OptimizeHyperparameters' to 'auto' causes the 'auto' values to apply.

The eligible parameters for fitcecoc are:

  • Codingfitcecoc searches among 'onevsall' and 'onevsone'.

  • The eligible hyperparameters for the chosen Learners:

    LearnersEligible Hyperparameters
    (Bold = Default)
    Default Range
    'discriminant'DeltaLog-scaled in the range [1e-6,1e3]
    DiscrimType'linear', 'quadratic', 'diagLinear', 'diagQuadratic', 'pseudoLinear', and 'pseudoQuadratic'
    GammaReal values in [0,1]
    'knn'Distance'cityblock', 'chebychev', 'correlation', 'cosine', 'euclidean', 'hamming', 'jaccard', 'mahalanobis', 'minkowski', 'seuclidean', and 'spearman'
    DistanceWeight'equal', 'inverse', and 'squaredinverse'
    ExponentPositive values in [0.5,3]
    NumNeighborsPositive integer values log-scaled in the range [1, max(2,round(NumObservations/2))]
    Standardize'true' and 'false'
    'linear'LambdaPositive values log-scaled in the range [1e-5/NumObservations,1e5/NumObservations]
    Learner'svm' and 'logistic'
    Regularization'ridge' and 'lasso'
    'naivebayes'DistributionNames'normal' and 'kernel'
    WidthPositive values log-scaled in the range [MinPredictorDiff/4,max(MaxPredictorRange,MinPredictorDiff)]
    Kernel'normal', 'box', 'epanechnikov', and 'triangle'
    'svm'BoxConstraintPositive values log-scaled in the range [1e-3,1e3]
    KernelScalePositive values log-scaled in the range [1e-3,1e3]
    KernelFunction'gaussian', 'linear', and 'polynomial'
    PolynomialOrderIntegers in the range [2,4]
    Standardize'true' and 'false'
    'tree'MaxNumSplitsIntegers log-scaled in the range [1,max(2,NumObservations-1)]
    MinLeafSizeIntegers log-scaled in the range [1,max(2,floor(NumObservations/2))]
    NumVariablesToSampleIntegers in the range [1,max(2,NumPredictors)]
    SplitCriterion'gdi', 'deviance', and 'twoing'

    Alternatively, use hyperparameters with your chosen Learners, such as

    load fisheriris % hyperparameters requires data and learner
    params = hyperparameters('fitcecoc',meas,species,'svm');

    To see the eligible and default hyperparameters, examine params.

Set nondefault parameters by passing a vector of optimizableVariable objects that have nondefault values. For example,

load fisheriris
params = hyperparameters('fitcecoc',meas,species,'svm');
params(2).Range = [1e-4,1e6];

Pass params as the value of OptimizeHyperparameters.

By default, iterative display appears at the command line, and plots appear according to the number of hyperparameters in the optimization. For the optimization and plots, the objective function is log(1 + cross-validation loss) for regression and the misclassification rate for classification. To control the iterative display, set the Verbose field of the 'HyperparameterOptimizationOptions' name-value pair argument. To control the plots, set the ShowPlots field of the 'HyperparameterOptimizationOptions' name-value pair argument.

For an example, see Optimize ECOC Classifier.

Ejemplo: 'auto'

Options for optimization, specified as the comma-separated pair consisting of 'HyperparameterOptimizationOptions' and a structure. This argument modifies the effect of the OptimizeHyperparameters name-value pair argument. All fields in the structure are optional.

Field NameValuesDefault
Optimizer
  • 'bayesopt' — Use Bayesian optimization. Internally, this setting calls bayesopt.

  • 'gridsearch' — Use grid search with NumGridDivisions values per dimension.

  • 'randomsearch' — Search at random among MaxObjectiveEvaluations points.

'gridsearch' searches in a random order, using uniform sampling without replacement from the grid. After optimization, you can get a table in grid order by using the command sortrows(Mdl.HyperparameterOptimizationResults).

'bayesopt'
AcquisitionFunctionName

  • 'expected-improvement-per-second-plus'

  • 'expected-improvement'

  • 'expected-improvement-plus'

  • 'expected-improvement-per-second'

  • 'lower-confidence-bound'

  • 'probability-of-improvement'

For details, see the bayesopt AcquisitionFunctionName name-value pair argument, or Acquisition Function Types.

'expected-improvement-per-second-plus'
MaxObjectiveEvaluationsMaximum number of objective function evaluations.30 for 'bayesopt' or 'randomsearch', and the entire grid for 'gridsearch'
MaxTime

Time limit, specified as a positive real. The time limit is in seconds, as measured by tic and toc. Run time can exceed MaxTime because MaxTime does not interrupt function evaluations.

Inf
NumGridDivisionsFor 'gridsearch', the number of values in each dimension. The value can be a vector of positive integers giving the number of values for each dimension, or a scalar that applies to all dimensions. This field is ignored for categorical variables.10
ShowPlotsLogical value indicating whether to show plots. If true, this field plots the best objective function value against the iteration number. If there are one or two optimization parameters, and if Optimizer is 'bayesopt', then ShowPlots also plots a model of the objective function against the parameters.true
SaveIntermediateResultsLogical value indicating whether to save results when Optimizer is 'bayesopt'. If true, this field overwrites a workspace variable named 'BayesoptResults' at each iteration. The variable is a BayesianOptimization object.false
Verbose

Display to the command line.

  • 0 — No iterative display

  • 1 — Iterative display

  • 2 — Iterative display with extra information

For details, see the bayesopt Verbose name-value pair argument.

1
UseParallelLogical value indicating whether to run Bayesian optimization in parallel, which requires Parallel Computing Toolbox . For details, see Parallel Bayesian Optimization.false
Repartition

Logical value indicating whether to repartition the cross-validation at every iteration. If false, the optimizer uses a single partition for the optimization.

true usually gives the most robust results because this setting takes partitioning noise into account. However, for good results, true requires at least twice as many function evaluations.

false
Use no more than one of the following three field names.
CVPartitionA cvpartition object, as created by cvpartition.'Kfold',5 if you do not specify any cross-validation field
HoldoutA scalar in the range (0,1) representing the holdout fraction.
KfoldAn integer greater than 1.

Ejemplo: 'HyperparameterOptimizationOptions',struct('MaxObjectiveEvaluations',60)

Tipos de datos: struct

Output Arguments

contraer todo

Trained ECOC classifier, returned as a ClassificationECOC or CompactClassificationECOC model object, or a ClassificationPartitionedECOC, ClassificationPartitionedLinearECOC, or ClassificationPartitionedKernelECOC cross-validated model object.

This table shows how the types of model objects returned by fitcecoc depend on the type of binary learners you specify and whether you perform cross-validation.

Linear Classification Model LearnersKernel Classification Model LearnersCross-ValidationReturned Model Object
NoNoNoClassificationECOC
NoNoYesClassificationPartitionedECOC
YesNoNoCompactClassificationECOC
YesNoYesClassificationPartitionedLinearECOC
NoYesNoCompactClassificationECOC
NoYesYesClassificationPartitionedKernelECOC

Description of the cross-validation optimization of hyperparameters, returned as a BayesianOptimization object or a table of hyperparameters and associated values. Nonempty when the OptimizeHyperparameters name-value pair was nonempty and Learners was 'linear'. Value depends on the setting of the HyperparameterOptimizationOptions name-value pair:

  • 'bayesopt' (default) — Object of class BayesianOptimization.

  • 'gridsearch' or 'randomsearch' — Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observation from smallest (best) to highest (worst).

Tipos de datos: table

Limitations

  • For training linear or kernel classification models, fitcecoc does not support tables. That is, if Learners is 'linear' or 'kernel', contains a linear classification model learner template (see templateLinear), or contains a kernel classification model learner template ( see templateKernel), then you cannot supply Tbl, ResponseVarName, or formula. Supply a matrix of predictor data (X) and an array of responses (Y) instead.

  • fitcecoc supports sparse matrices for training linear classification models only. For all other models, supply a full matrix of predictor data instead.

Más acerca de

contraer todo

Binary Loss

A binary loss is a function of the class and classification score that determines how well a binary learner classifies an observation into the class.

Suppose the following:

  • mkj is element (k,j) of the coding design matrix M (that is, the code corresponding to class k of binary learner j).

  • sj is the score of binary learner j for an observation.

  • g is the binary loss function.

  • k^ is the predicted class for the observation.

In loss-based decoding [Escalera et al.], the class producing the minimum sum of the binary losses over binary learners determines the predicted class of an observation, that is,

k^=argminkj=1L|mkj|g(mkj,sj).

In loss-weighted decoding [Escalera et al.], the class producing the minimum average of the binary losses over binary learners determines the predicted class of an observation, that is,

k^=argminkj=1L|mkj|g(mkj,sj)j=1L|mkj|.

Allwein et al. suggest that loss-weighted decoding improves classification accuracy by keeping loss values for all classes in the same dynamic range.

This table summarizes the supported loss functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj).

ValueDescriptionScore Domaing(yj,sj)
'binodeviance'Binomial deviance(–∞,∞)log[1 + exp(–2yjsj)]/[2log(2)]
'exponential'Exponential(–∞,∞)exp(–yjsj)/2
'hamming'Hamming[0,1] or (–∞,∞)[1 – sign(yjsj)]/2
'hinge'Hinge(–∞,∞)max(0,1 – yjsj)/2
'linear'Linear(–∞,∞)(1 – yjsj)/2
'logit'Logistic(–∞,∞)log[1 + exp(–yjsj)]/[2log(2)]
'quadratic'Quadratic[0,1][1 – yj(2sj – 1)]2/2

The software normalizes binary losses such that the loss is 0.5 when yj = 0, and aggregates using the average of the binary learners [Allwein et al.].

Do not confuse the binary loss with the overall classification loss (specified by the 'LossFun' name-value pair argument of the loss and predict object functions), which measures how well an ECOC classifier performs as a whole.

Coding Design

A coding design is a matrix where elements direct which classes are trained by each binary learner, that is, how the multiclass problem is reduced to a series of binary problems.

Each row of the coding design corresponds to a distinct class, and each column corresponds to a binary learner. In a ternary coding design (adopted by the software), for a particular column (or binary learner):

  • Rows containing a 1 indicate to the binary learner to group all observations in the corresponding classes into a positive class.

  • Rows containing a -1 indicate to the binary learner to group all observations in the corresponding classes into a negative class.

  • Rows containing a 0 indicate to the binary learner to ignore all observations in the corresponding classes.

Coding matrices with large, minimal, pair-wise row distances based on the Hamming measure are desirable. For details on the pair-wise row distance, see Random Coding Design Matrices and [4].

This table describes popular coding designs. For the example, suppose K (the number of distinct classes) is 3.

Coding DesignDescriptionNumber of Learners Minimal Pair-Wise Row Distance
one-versus-all (OVA)For each binary learner, one class is positive and the rest are negative. This design exhausts all combinations of positive class assignments.K2
one-versus-one (OVO)For each binary learner, one class is positive, another is negative, and the rest are ignored. This design exhausts all combinations of class pair assignments.

K(K – 1)/2

1
binary complete

This design partitions the classes into all binary combinations, and does not ignore any classes. That is, all class assignments are -1 and 1 with at least one positive and negative class in the assignment for each binary learner.

2K – 1 – 12K – 2
ternary complete

This design partitions the classes into all ternary combinations. That is, all class assignments are 0, -1, and 1 with at least one positive and negative class in the assignment for each binary learner.

(3K – 2K + 1 + 1)/2

3K – 2
ordinalFor the first binary learner, the first class is negative, and the rest positive. For the second binary learner, the first two classes are negative, and the rest positive, and so on.K – 11
dense randomFor each binary learner, the software randomly assigns classes into positive or negative classes, with at least one of each type. For more details, see Random Coding Design Matrices.

Random, but approximately 10 log2K

Variable
sparse randomFor each binary learner, the software randomly assigns classes as positive or negative with probability 0.25 for each, and ignores classes with probability 0.5. For more details, see Random Coding Design Matrices.

Random, but approximately 15 log2K

Variable

This plot compares the number of binary learners for the coding designs with increasing K.

Error-Correcting Output Codes Model

An error-correcting output codes (ECOC) model reduces the problem of classification with three or more classes to a set of binary classifiers.

ECOC classification requires a coding design, which determines the classes that the binary learners train on, and a decoding scheme, which determines how the results (predictions) of the binary classifiers are aggregated.

Suppose that:

  • There are three classes.

  • The coding design is one-versus-one.

  • The decoding scheme uses loss g.

  • The learners are SVMs.

To build this classification model, ECOC follows these steps.

  1. A one-versus-one coding design is

    Learner 1Learner 2Learner 3Class 1110Class 2101Class 3011

    Learner 1 trains on observations having Class 1 and Class 2, and treats Class 1 as the positive class and Class 2 as the negative class. The other learners are trained similarly. Let M be the coding design matrix with elements mkl, and sl be the predicted classification score for the positive class of learner l.

  2. A new observation is assigned to the class (k^) that minimizes the aggregation of the losses for the L binary learners. That is,

    k^=argminkl=1L|mkl|g(mkl,sl)l=1L|mkl|.

ECOC models can improve classification accuracy, even compared to other multiclass models [2].

Sugerencias

  • The number of binary learners grows with the number of classes. For a problem with many classes, the binarycomplete and ternarycomplete coding designs are not efficient. However:

    • If K ≤ 4, then use ternarycomplete coding design rather than sparserandom.

    • If K ≤ 5, then use binarycomplete coding design rather than denserandom.

    You can display the coding design matrix of a trained ECOC classifier by entering Mdl.CodingMatrix into the Command Window.

  • You should form a coding matrix using intimate knowledge of the application, and taking into account computational constraints. If you have sufficient computational power and time, then try several coding matrices and choose the one with the best performance (e.g., check the confusion matrices for each model using confusionmat).

  • Leave-one-out cross-validation (Leaveout) is inefficient for data sets with many observations. Instead, use k-fold cross-validation (KFold).

  • After training a model, you can generate C/C++ code that predicts labels for new data. Generating C/C++ code requires MATLAB Coder™ . For details, see Introduction to Code Generation.

Algoritmos

contraer todo

Custom Coding Design Matrices

Custom coding matrices must have a certain form. The software validates custom coding matrices by ensuring:

  • Every element is -1, 0, or 1.

  • Every column contains as least one -1 and one 1.

  • For all distinct column vectors u and v, uv and u ≠ -v.

  • All rows vectors are unique.

  • The matrix can separate any two classes. That is, you can travel from any row to any other row following these rules:

    • You can move vertically from 1 to -1 or -1 to 1.

    • You can move horizontally from a nonzero element to another nonzero element.

    • You can use a column of the matrix for a vertical move only once.

    If it is not possible to move from row i to row j using these rules, then classes i and j cannot be separated by the design. For example, in the coding design

    [10100101]

    classes 1 and 2 cannot be separated from classes 3 and 4 (that is, you cannot move horizontally from the -1 in row 2 to column 2 since there is a 0 in that position). Therefore, the software rejects this coding design.

Parallel Computing

If you use parallel computing (see Options), then fitcecoc trains binary learners in parallel.

Prior Probabilities and Cost

  • Prior probabilities — The software normalizes the specified class prior probabilities (Prior) for each binary learner. Let M be the coding design matrix and I(A,c) be an indicator matrix. The indicator matrix has the same dimensions as A. If the corresponding element of A is c, then the indicator matrix has elements equaling one, and zero otherwise. Let M+1 and M-1 be K-by-L matrices such that:

    • M+1 = MI(M,1), where ○ is element-wise multiplication (that is, Mplus = M.*(M == 1)). Also, let ml(+1) be column vector l of M+1.

    • M-1 = -MI(M,-1) (that is, Mminus = -M.*(M == -1)). Also, let ml(1) be column vector l of M-1.

    Let πl+1=ml(+1)°π and πl1=ml(1)°π, where π is the vector of specified, class prior probabilities (Prior).

    Then, the positive and negative, scalar class prior probabilities for binary learner l are

    π^l(j)=πl(j)1πl(+1)1+πl(1)1,

    where j = {-1,1} and a1 is the one-norm of a.

  • Cost — The software normalizes the K-by-K cost matrix C (Cost) for each binary learner. For binary learner l, the cost of classifying a negative-class observation into the positive class is

    cl+=(πl(1))Cπl(+1).

    Similarly, the cost of classifying a positive-class observation into the negative class is

    cl+=(πl(+1))Cπl(1).

    The cost matrix for binary learner l is

    Cl=[0cl+cl+0].

    ECOC models accommodate misclassification costs by incorporating them with class prior probabilities. If you specify Prior and Cost, then the software adjusts the class prior probabilities as follows:

    π¯l1=cl+π^l1cl+π^l1+c+π^l+1π¯l+1=cl+π^l+1cl+π^l1+c+π^l+1.

Random Coding Design Matrices

For a given number of classes, for example, K, the software generates random coding design matrices as follows.

  1. The software generates one of the following:

    1. Dense random — The software assigns a 1 or -1 with equal probability to each element of the K-by-Ld coding design matrix, where Ld10log2K

    2. Sparse random — The software assigns a 1 to each element of the K-by-Ls coding design matrix with probability 0.25, a -1 with probability 0.25, and a 0 with probability 0.5, where Ls15log2K

  2. If a column does not contain at least one 1 and at least one -1, then the software removes that column.

  3. For distinct columns u and v, if u = v or u = -v, then the software removes v from the coding design matrix.

The software randomly generates 10,000 matrices by default, and retains the matrix with the largest, minimal pairwise row distance based on the Hamming measure ([4]) given by

Δ(k1,k2)=0.5l=1L|mk1l||mk2l||mk1lmk2l|,

where mkjl is an element of coding design matrix j.

Support Vector Storage

For linear, SVM binary learners, and for efficiency, fitcecoc empties the properties Alpha, SupportVectorLabels, and SupportVectors. fitcecoc lists Beta, rather than Alpha, in the model display.

To store Alpha, SupportVectorLabels, and SupportVectors, pass a linear, SVM template that specifies storing support vectors to fitcecoc. For example, enter:

t = templateSVM('SaveSupportVectors','on')
Mdl = fitcecoc(X,Y,'Learners',t);

You can remove the support vectors and related values by passing the resulting ClassificationECOC model to discardSupportVectors.

Referencias

[1] Allwein, E., R. Schapire, and Y. Singer. “Reducing multiclass to binary: A unifying approach for margin classifiers.” Journal of Machine Learning Research. Vol. 1, 2000, pp. 113–141.

[2] Fürnkranz, Johannes, “Round Robin Classification.” J. Mach. Learn. Res., Vol. 2, 2002, pp. 721–747.

[3] Escalera, S., O. Pujol, and P. Radeva. “On the decoding process in ternary error-correcting output codes.” IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 32, Issue 7, 2010, pp. 120–134.

[4] Escalera, S., O. Pujol, and P. Radeva. “Separability of ternary codes for sparse designs of error-correcting output codes.” Pattern Recog. Lett., Vol. 30, Issue 3, 2009, pp. 285–297.

Capacidades ampliadas

Introducido en R2014b