# kfoldLoss

Classification loss for observations not used in training

## Syntax

``L = kfoldLoss(CVMdl)``
``L = kfoldLoss(CVMdl,Name,Value)``

## Description

example

````L = kfoldLoss(CVMdl)` returns the cross-validated classification error rates estimated by the cross-validated, error-correcting output codes (ECOC) model composed of linear classification models `CVMdl`. That is, for every fold, `kfoldLoss` estimates the classification error rate for observations that it holds out when it trains using all other observations. `kfoldLoss` applies the same data used create `CVMdl` (see `fitcecoc`).`L` contains a classification loss for each regularization strength in the linear classification models that compose `CVMdl`.```

example

````L = kfoldLoss(CVMdl,Name,Value)` uses additional options specified by one or more `Name,Value` pair arguments. For example, specify a decoding scheme, which folds to use for the loss calculation, or verbosity level.```

## Input Arguments

expand all

Cross-validated, ECOC model composed of linear classification models, specified as a `ClassificationPartitionedLinearECOC` model object. You can create a `ClassificationPartitionedLinearECOC` model using `fitcecoc` and by:

1. Specifying any one of the cross-validation, name-value pair arguments, for example, `CrossVal`

2. Setting the name-value pair argument `Learners` to `'linear'` or a linear classification model template returned by `templateLinear`

To obtain estimates, kfoldLoss applies the same data used to cross-validate the ECOC model (`X` and `Y`).

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Binary learner loss function, specified as the comma-separated pair consisting of `'BinaryLoss'` and a built-in, loss-function name or function handle.

• This table contains names and descriptions of the built-in functions, where yj is a class label for a particular binary learner (in the set {-1,1,0}), sj is the score for observation j, and g(yj,sj) is the binary loss formula.

ValueDescriptionScore Domaing(yj,sj)
`'binodeviance'`Binomial deviance(–∞,∞)log[1 + exp(–2yjsj)]/[2log(2)]
`'exponential'`Exponential(–∞,∞)exp(–yjsj)/2
`'hamming'`Hamming[0,1] or (–∞,∞)[1 – sign(yjsj)]/2
`'hinge'`Hinge(–∞,∞)max(0,1 – yjsj)/2
`'linear'`Linear(–∞,∞)(1 – yjsj)/2
`'logit'`Logistic(–∞,∞)log[1 + exp(–yjsj)]/[2log(2)]
`'quadratic'`Quadratic[0,1][1 – yj(2sj – 1)]2/2

The software normalizes the binary losses such that the loss is 0.5 when yj = 0. Also, the software calculates the mean binary loss for each class.

• For a custom binary loss function, e.g., `customFunction`, specify its function handle `'BinaryLoss',@customFunction`.

`customFunction` should have this form

`bLoss = customFunction(M,s)`
where:

• `M` is the K-by-L coding matrix stored in `Mdl.CodingMatrix`.

• `s` is the 1-by-L row vector of classification scores.

• `bLoss` is the classification loss. This scalar aggregates the binary losses for every learner in a particular class. For example, you can use the mean binary loss to aggregate the loss over the learners for each class.

• K is the number of classes.

• L is the number of binary learners.

For an example of passing a custom binary loss function, see Predict Test-Sample Labels of ECOC Model Using Custom Binary Loss Function.

By default, if all binary learners are linear classification models using:

• SVM, then `BinaryLoss` is `'hinge'`

• Logistic regression, then `BinaryLoss` is `'quadratic'`

Example: `'BinaryLoss','binodeviance'`

Data Types: `char` | `string` | `function_handle`

Decoding scheme that aggregates the binary losses, specified as the comma-separated pair consisting of `'Decoding'` and `'lossweighted'` or `'lossbased'`. For more information, see Binary Loss.

Example: `'Decoding','lossbased'`

Fold indices to use for classification-score prediction, specified as the comma-separated pair consisting of `'Folds'` and a numeric vector of positive integers. The elements of `Folds` must range from `1` through `CVMdl.KFold`.

Example: `'Folds',[1 4 10]`

Data Types: `single` | `double`

Loss function, specified as the comma-separated pair consisting of `'LossFun'` and a function handle or `'classiferror'`.

You can:

• Specify the built-in function `'classiferror'`, then the loss function is the classification error.

• Specify your own function using function handle notation.

For what follows, `n` is the number of observations in the training data (`CVMdl.NumObservations`) and `K` is the number of classes (`numel(CVMdl.ClassNames)`). Your function needs the signature `lossvalue = lossfun(C,S,W,Cost)`, where:

• The output argument `lossvalue` is a scalar.

• You choose the function name (`lossfun`).

• `C` is an `n`-by-`K` logical matrix with rows indicating which class the corresponding observation belongs. The column order corresponds to the class order in `CVMdl.ClassNames`.

Construct `C` by setting ```C(p,q) = 1``` if observation `p` is in class `q`, for each row. Set every element of row `p` to `0`.

• `S` is an `n`-by-`K` numeric matrix of negated loss values for classes. Each row corresponds to an observation. The column order corresponds to the class order in `CVMdl.ClassNames`. `S` resembles the output argument `NegLoss` of `kfoldPredict`.

• `W` is an `n`-by-1 numeric vector of observation weights. If you pass `W`, the software normalizes its elements to sum to `1`.

• `Cost` is a `K`-by-`K` numeric matrix of misclassification costs. For example, `Cost` = ```ones(K) -eye(K)``` specifies a cost of 0 for correct classification, and 1 for misclassification.

Specify your function using `'LossFun',@lossfun`.

Data Types: `function_handle` | `char` | `string`

Loss aggregation level, specified as the comma-separated pair consisting of `'Mode'` and `'average'` or `'individual'`.

ValueDescription
`'average'`Returns losses averaged over all folds
`'individual'`Returns losses for each fold

Example: `'Mode','individual'`

Estimation options, specified as the comma-separated pair consisting of `'Options'` and a structure array returned by `statset`.

To invoke parallel computing:

• You need a Parallel Computing Toolbox™ license.

• Specify `'Options',statset('UseParallel',true)`.

Verbosity level, specified as the comma-separated pair consisting of `'Verbose'` and `0` or `1`. `Verbose` controls the number of diagnostic messages that the software displays in the Command Window.

If `Verbose` is `0`, then the software does not display diagnostic messages. Otherwise, the software displays diagnostic messages.

Example: `'Verbose',1`

Data Types: `single` | `double`

## Output Arguments

expand all

Cross-validated classification losses, returned as a numeric scalar, vector, or matrix. The interpretation of `L` depends on `LossFun`.

Let `R` be the number of regularizations strengths is the cross-validated models (`CVMdl.Trained{1}.BinaryLearners{1}.Lambda`) and `F` be the number of folds (stored in `CVMdl.KFold`).

• If `Mode` is `'average'`, then `L` is a 1-by-`R` vector. `L(j)` is the average classification loss over all folds of the cross-validated model that uses regularization strength `j`.

• Otherwise, `L` is a `F`-by-`R` matrix. `L(i,j)` is the classification loss for fold `i` of the cross-validated model that uses regularization strength `j`.

## Examples

expand all

`load nlpdata`

`X` is a sparse matrix of predictor data, and `Y` is a categorical vector of class labels.

Cross-validate an ECOC model of linear classification models.

```rng(1); % For reproducibility CVMdl = fitcecoc(X,Y,'Learner','linear','CrossVal','on');```

`CVMdl` is a `ClassificationPartitionedLinearECOC` model. By default, the software implements 10-fold cross validation.

Estimate the average of the out-of-fold classification error rates.

`ce = kfoldLoss(CVMdl)`
```ce = 0.0958 ```

Alternatively, you can obtain the per-fold classification error rates by specifying the name-value pair `'Mode','individual'` in `kfoldLoss`.

Load the NLP data set. Transpose the predictor data.

```load nlpdata X = X';```

For simplicity, use the label 'others' for all observations in `Y` that are not `'simulink'`, `'dsp'`, or `'comm'`.

`Y(~(ismember(Y,{'simulink','dsp','comm'}))) = 'others';`

Create a linear classification model template that specifies optimizing the objective function using SpaRSA.

`t = templateLinear('Solver','sparsa');`

Cross-validate an ECOC model of linear classification models using 5-fold cross-validation. Optimize the objective function using SpaRSA. Specify that the predictor observations correspond to columns.

```rng(1); % For reproducibility CVMdl = fitcecoc(X,Y,'Learners',t,'KFold',5,'ObservationsIn','columns'); CMdl1 = CVMdl.Trained{1}```
```CMdl1 = CompactClassificationECOC ResponseName: 'Y' ClassNames: [comm dsp simulink others] ScoreTransform: 'none' BinaryLearners: {6x1 cell} CodingMatrix: [4x6 double] Properties, Methods ```

`CVMdl` is a `ClassificationPartitionedLinearECOC` model. It contains the property `Trained`, which is a 5-by-1 cell array holding a `CompactClassificationECOC` models that the software trained using the training set of each fold.

Create a function that takes the minimal loss for each observation, and then averages the minimal losses across all observations. Because the function does not use the class-identifier matrix (`C`), observation weights (`W`), and classification cost (`Cost`), use `~` to have `kfoldLoss` ignore its their positions.

`lossfun = @(~,S,~,~)mean(min(-S,[],2));`

Estimate the average cross-validated classification loss using the minimal loss per observation function. Also, obtain the loss for each fold.

`ce = kfoldLoss(CVMdl,'LossFun',lossfun)`
```ce = 0.0243 ```
`ceFold = kfoldLoss(CVMdl,'LossFun',lossfun,'Mode','individual')`
```ceFold = 5×1 0.0244 0.0255 0.0248 0.0240 0.0226 ```

To determine a good lasso-penalty strength for an ECOC model composed of linear classification models that use logistic regression learners, implement 5-fold cross-validation.

`load nlpdata`

`X` is a sparse matrix of predictor data, and `Y` is a categorical vector of class labels.

For simplicity, use the label 'others' for all observations in `Y` that are not `'simulink'`, `'dsp'`, or `'comm'`.

`Y(~(ismember(Y,{'simulink','dsp','comm'}))) = 'others';`

Create a set of 11 logarithmically-spaced regularization strengths from $1{0}^{-7}$ through $1{0}^{-2}$.

`Lambda = logspace(-7,-2,11);`

Create a linear classification model template that specifies to use logistic regression learners, use lasso penalties with strengths in `Lambda`, train using SpaRSA, and lower the tolerance on the gradient of the objective function to `1e-8`.

```t = templateLinear('Learner','logistic','Solver','sparsa',... 'Regularization','lasso','Lambda',Lambda,'GradientTolerance',1e-8);```

Cross-validate the models. To increase execution speed, transpose the predictor data and specify that the observations are in columns.

```X = X'; rng(10); % For reproducibility CVMdl = fitcecoc(X,Y,'Learners',t,'ObservationsIn','columns','KFold',5);```

`CVMdl` is a `ClassificationPartitionedLinearECOC` model.

Dissect `CVMdl`, and each model within it.

`numECOCModels = numel(CVMdl.Trained)`
```numECOCModels = 5 ```
`ECOCMdl1 = CVMdl.Trained{1}`
```ECOCMdl1 = CompactClassificationECOC ResponseName: 'Y' ClassNames: [comm dsp simulink others] ScoreTransform: 'none' BinaryLearners: {6×1 cell} CodingMatrix: [4×6 double] Properties, Methods ```
`numCLModels = numel(ECOCMdl1.BinaryLearners)`
```numCLModels = 6 ```
`CLMdl1 = ECOCMdl1.BinaryLearners{1}`
```CLMdl1 = ClassificationLinear ResponseName: 'Y' ClassNames: [-1 1] ScoreTransform: 'logit' Beta: [34023×11 double] Bias: [-0.3169 -0.3169 -0.3168 -0.3168 -0.3168 -0.3167 -0.1725 -0.0805 -0.1762 -0.3450 -0.5174] Lambda: [1.0000e-07 3.1623e-07 1.0000e-06 3.1623e-06 1.0000e-05 3.1623e-05 1.0000e-04 3.1623e-04 1.0000e-03 0.0032 0.0100] Learner: 'logistic' Properties, Methods ```

Because `fitcecoc` implements 5-fold cross-validation, `CVMdl` contains a 5-by-1 cell array of `CompactClassificationECOC` models that the software trains on each fold. The `BinaryLearners` property of each `CompactClassificationECOC` model contains the `ClassificationLinear` models. The number of `ClassificationLinear` models within each compact ECOC model depends on the number of distinct labels and coding design. Because `Lambda` is a sequence of regularization strengths, you can think of `CLMdl1` as 11 models, one for each regularization strength in `Lambda`.

Determine how well the models generalize by plotting the averages of the 5-fold classification error for each regularization strength. Identify the regularization strength that minimizes the generalization error over the grid.

```ce = kfoldLoss(CVMdl); figure; plot(log10(Lambda),log10(ce)) [~,minCEIdx] = min(ce); minLambda = Lambda(minCEIdx); hold on plot(log10(minLambda),log10(ce(minCEIdx)),'ro'); ylabel('log_{10} 5-fold classification error') xlabel('log_{10} Lambda') legend('MSE','Min classification error') hold off```

Train an ECOC model composed of linear classification model using the entire data set, and specify the minimal regularization strength.

```t = templateLinear('Learner','logistic','Solver','sparsa',... 'Regularization','lasso','Lambda',minLambda,'GradientTolerance',1e-8); MdlFinal = fitcecoc(X,Y,'Learners',t,'ObservationsIn','columns');```

To estimate labels for new observations, pass `MdlFinal` and the new data to `predict`.

expand all

## References

[1] Allwein, E., R. Schapire, and Y. Singer. “Reducing multiclass to binary: A unifying approach for margin classiﬁers.” Journal of Machine Learning Research. Vol. 1, 2000, pp. 113–141.

[2] Escalera, S., O. Pujol, and P. Radeva. “On the decoding process in ternary error-correcting output codes.” IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 32, Issue 7, 2010, pp. 120–134.

[3] Escalera, S., O. Pujol, and P. Radeva. “Separability of ternary codes for sparse designs of error-correcting output codes.” Pattern Recogn. Vol. 30, Issue 3, 2009, pp. 285–297.

## Extended Capabilities

Introduced in R2016a