# ClassificationPartitionedLinearECOC

Package: classreg.learning.partition
Superclasses: `ClassificationPartitionedModel`

Cross-validated linear error-correcting output codes model for multiclass classification of high-dimensional data

## Description

`ClassificationPartitionedLinearECOC` is a set of error-correcting output codes (ECOC) models composed of linear classification models, trained on cross-validated folds. Estimate the quality of classification by cross-validation using one or more “kfold” functions: `kfoldPredict`, `kfoldLoss`, `kfoldMargin`, and `kfoldEdge`.

Every “kfold” method uses models trained on in-fold observations to predict the response for out-of-fold observations. For example, suppose that you cross-validate using five folds. In this case, the software randomly assigns each observation into five roughly equal-sized groups. The training fold contains four of the groups (that is, roughly 4/5 of the data) and the test fold contains the other group (that is, roughly 1/5 of the data). In this case, cross-validation proceeds as follows.

1. The software trains the first model (stored in `CVMdl.Trained{1}`) using the observations in the last four groups and reserves the observations in the first group for validation.

2. The software trains the second model (stored in `CVMdl.Trained{2}`) using the observations in the first group and last three groups. The software reserves the observations in the second group for validation.

3. The software proceeds in a similar fashion for the third, fourth, and fifth models.

If you validate by calling `kfoldPredict`, it computes predictions for the observations in group 1 using the first model, group 2 for the second model, and so on. In short, the software estimates a response for every observation using the model trained without that observation.

Note

`ClassificationPartitionedLinearECOC` model objects do not store the predictor data set.

## Construction

`CVMdl = fitcecoc(X,Y,'Learners',t,Name,Value)` returns a cross-validated, linear ECOC model when:

• `t` is `'Linear'` or a template object returned by `templateLinear`.

• `Name` is one of `'CrossVal'`, `'CVPartition'`, `'Holdout'`, or `'KFold'`.

For more details, see `fitcecoc`.

## Properties

expand all

Cross-Validation Properties

Cross-validated model name, specified as a character vector.

For example, `'ECOC'` specifies a cross-validated ECOC model.

Data Types: `char`

Number of cross-validated folds, specified as a positive integer.

Data Types: `double`

Cross-validation parameter values, e.g., the name-value pair argument values used to cross-validate the ECOC classifier, specified as an object. `ModelParameters` does not contain estimated parameters.

Access properties of `ModelParameters` using dot notation.

Number of observations in the training data, specified as a positive numeric scalar.

Data Types: `double`

Data partition indicating how the software splits the data into cross-validation folds, specified as a `cvpartition` model.

Compact classifiers trained on cross-validation folds, specified as a cell array of `CompactClassificationECOC` models. `Trained` has k cells, where k is the number of folds.

Data Types: `cell`

Observation weights used to cross-validate the model, specified as a numeric vector. `W` has `NumObservations` elements.

The software normalizes the weights used for training so that `sum(W,'omitnan')` is `1`.

Data Types: `single` | `double`

Observed class labels used to cross-validate the model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. `Y` has `NumObservations` elements, and is the same data type as the input argument `Y` that you passed to `fitcecoc` to cross-validate the model. (The software treats string arrays as cell arrays of character vectors.)

Each row of `Y` represents the observed classification of the observation in the predictor data.

Data Types: `char` | `cell` | `categorical` | `logical` | `single` | `double`

ECOC Properties

Binary learner loss function, specified as a character vector representing the loss function name.

If you train using binary learners that use different loss functions, then the software sets `BinaryLoss` to `'hamming'`. To potentially increase accuracy, specify a binary loss function other than the default during a prediction or loss computation by using the `'BinaryLoss'` name-value pair argument of `predict` or `loss`.

Data Types: `char`

Binary learner class labels, specified as a numeric matrix or `[]`.

• If the coding matrix is the same across folds, then `BinaryY` is a `NumObservations`-by-L matrix, where L is the number of binary learners (`size(CodingMatrix,2)`).

Elements of `BinaryY` are `-1`, `0`, or `1`, and the value corresponds to a dichotomous class assignment. This table describes how learner `j` assigns observation `k` to a dichotomous class corresponding to the value of `BinaryY(k,j)`.

ValueDichotomous Class Assignment
`–1`Learner `j` assigns observation `k` to a negative class.
`0`Before training, learner `j` removes observation `k` from the data set.
`1`Learner `j` assigns observation `k` to a positive class.

• If the coding matrix varies across folds, then `BinaryY` is empty (`[]`).

Data Types: `double`

Codes specifying class assignments for the binary learners, specified as a numeric matrix or `[]`.

• If the coding matrix is the same across folds, then `CodingMatrix` is a K-by-L matrix. K is the number of classes and L is the number of binary learners.

Elements of `CodingMatrix` are `-1`, `0`, or `1`, and the value corresponds to a dichotomous class assignment. This table describes how learner `j` assigns observations in class `i` to a dichotomous class corresponding to the value of `CodingMatrix(i,j)`.

ValueDichotomous Class Assignment
`–1`Learner `j` assigns observations in class `i` to a negative class.
`0`Before training, learner `j` removes observations in class `i` from the data set.
`1`Learner `j` assigns observations in class `i` to a positive class.

• If the coding matrix varies across folds, then `CodingMatrix` is empty (`[]`). Obtain the coding matrix for each fold using the `Trained` property. For example, `CVMdl.Trained{1}.CodingMatrix` is the coding matrix in the first fold of the cross-validated ECOC model `CVMdl`.

Data Types: `double` | `single` | `int8` | `int16` | `int32` | `int64`

Other Classification Properties

Categorical predictor indices, specified as a vector of positive integers. `CategoricalPredictors` contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and `p`, where `p` is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty (`[]`).

Data Types: `single` | `double`

Unique class labels used in training, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. `ClassNames` has the same data type as the class labels `Y`. (The software treats string arrays as cell arrays of character vectors.) `ClassNames` also determines the class order.

Data Types: `categorical` | `char` | `logical` | `single` | `double` | `cell`

Misclassification costs, specified as a square numeric matrix. `Cost` has K rows and columns, where K is the number of classes.

`Cost(i,j)` is the cost of classifying a point into class `j` if its true class is `i`. The order of the rows and columns of `Cost` corresponds to the order of the classes in `ClassNames`.

`fitcecoc` incorporates misclassification costs differently among different types of binary learners.

Data Types: `double`

Predictor names in order of their appearance in the predictor data, specified as a cell array of character vectors. The length of `PredictorNames` is equal to the number of variables in the training data `X` or `Tbl` used as predictor variables.

Data Types: `cell`

Prior class probabilities, specified as a numeric vector. `Prior` has as many elements as the number of classes in `ClassNames`, and the order of the elements corresponds to the order of the classes in `ClassNames`.

`fitcecoc` incorporates misclassification costs differently among different types of binary learners.

Data Types: `double`

Response variable name, specified as a character vector.

Data Types: `char`

Score transformation function to apply to predicted scores, specified as a function name or function handle.

For linear classification models and before transformation, the predicted classification score for the observation x (row vector) is f(x) = xβ + b, where β and b correspond to `Mdl.Beta` and `Mdl.Bias`, respectively.

To change the score transformation function to, for example, `function`, use dot notation.

• For a built-in function, enter this code and replace `function` with a value in the table.

`Mdl.ScoreTransform = 'function';`

ValueDescription
`"doublelogit"`1/(1 + e–2x)
`"invlogit"`log(x / (1 – x))
`"ismax"`Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0
`"logit"`1/(1 + ex)
`"none"` or `"identity"`x (no transformation)
`"sign"`–1 for x < 0
0 for x = 0
1 for x > 0
`"symmetric"`2x – 1
`"symmetricismax"`Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1
`"symmetriclogit"`2/(1 + ex) – 1

• For a MATLAB® function, or a function that you define, enter its function handle.

`Mdl.ScoreTransform = @function;`

`function` must accept a matrix of the original scores for each class, and then return a matrix of the same size representing the transformed scores for each class.

Data Types: `char` | `function_handle`

## Methods

 kfoldEdge Classification edge for observations not used for training kfoldLoss Classification loss for observations not used in training kfoldMargin Classification margins for observations not used in training kfoldPredict Predict labels for observations not used for training

## Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects.

## Examples

collapse all

`load nlpdata`

`X` is a sparse matrix of predictor data, and `Y` is a categorical vector of class labels.

Cross-validate a multiclass, linear classification model that can identify which MATLAB® toolbox a documentation web page is from based on counts of words on the page.

```rng(1); % For reproducibility CVMdl = fitcecoc(X,Y,'Learners','linear','CrossVal','on')```
```CVMdl = ClassificationPartitionedLinearECOC CrossValidatedModel: 'LinearECOC' ResponseName: 'Y' NumObservations: 31572 KFold: 10 Partition: [1x1 cvpartition] ClassNames: [comm dsp ecoder fixedpoint ... ] ScoreTransform: 'none' Properties, Methods ```

`CVMdl` is a `ClassificationPartitionedLinearECOC` cross-validated model. Because `fitcecoc` implements 10-fold cross-validation by default, `CVMdl.Trained` contains a 10-by-1 cell vector of ten `CompactClassificationECOC` models that contain the results of training ECOC models composed of binary, linear classification models for each of the folds.

Estimate labels for out-of-fold observations and estimate the generalization error by passing `CVMdl` to `kfoldPredict` and `kfoldLoss`, respectively.

```oofLabels = kfoldPredict(CVMdl); ge = kfoldLoss(CVMdl)```
```ge = 0.0958 ```

The estimated generalization error is about 10% misclassified observations.

To improve generalization error, try specifying another solver, such as LBFGS. To change default options when training ECOC models composed of linear classification models, create a linear classification model template using `templateLinear`, and then pass the template to `fitcecoc`.

To determine a good lasso-penalty strength for an ECOC model composed of linear classification models that use logistic regression learners, implement 5-fold cross-validation.

`load nlpdata`

`X` is a sparse matrix of predictor data, and `Y` is a categorical vector of class labels.

For simplicity, use the label 'others' for all observations in `Y` that are not `'simulink'`, `'dsp'`, or `'comm'`.

`Y(~(ismember(Y,{'simulink','dsp','comm'}))) = 'others';`

Create a set of 11 logarithmically-spaced regularization strengths from $1{0}^{-7}$ through $1{0}^{-2}$.

`Lambda = logspace(-7,-2,11);`

Create a linear classification model template that specifies to use logistic regression learners, use lasso penalties with strengths in `Lambda`, train using SpaRSA, and lower the tolerance on the gradient of the objective function to `1e-8`.

```t = templateLinear('Learner','logistic','Solver','sparsa',... 'Regularization','lasso','Lambda',Lambda,'GradientTolerance',1e-8);```

Cross-validate the models. To increase execution speed, transpose the predictor data and specify that the observations are in columns.

```X = X'; rng(10); % For reproducibility CVMdl = fitcecoc(X,Y,'Learners',t,'ObservationsIn','columns','KFold',5);```

`CVMdl` is a `ClassificationPartitionedLinearECOC` model.

Dissect `CVMdl`, and each model within it.

`numECOCModels = numel(CVMdl.Trained)`
```numECOCModels = 5 ```
`ECOCMdl1 = CVMdl.Trained{1}`
```ECOCMdl1 = CompactClassificationECOC ResponseName: 'Y' ClassNames: [comm dsp simulink others] ScoreTransform: 'none' BinaryLearners: {6×1 cell} CodingMatrix: [4×6 double] Properties, Methods ```
`numCLModels = numel(ECOCMdl1.BinaryLearners)`
```numCLModels = 6 ```
`CLMdl1 = ECOCMdl1.BinaryLearners{1}`
```CLMdl1 = ClassificationLinear ResponseName: 'Y' ClassNames: [-1 1] ScoreTransform: 'logit' Beta: [34023×11 double] Bias: [-0.3169 -0.3169 -0.3168 -0.3168 -0.3168 -0.3167 -0.1725 -0.0805 -0.1762 -0.3450 -0.5174] Lambda: [1.0000e-07 3.1623e-07 1.0000e-06 3.1623e-06 1.0000e-05 3.1623e-05 1.0000e-04 3.1623e-04 1.0000e-03 0.0032 0.0100] Learner: 'logistic' Properties, Methods ```

Because `fitcecoc` implements 5-fold cross-validation, `CVMdl` contains a 5-by-1 cell array of `CompactClassificationECOC` models that the software trains on each fold. The `BinaryLearners` property of each `CompactClassificationECOC` model contains the `ClassificationLinear` models. The number of `ClassificationLinear` models within each compact ECOC model depends on the number of distinct labels and coding design. Because `Lambda` is a sequence of regularization strengths, you can think of `CLMdl1` as 11 models, one for each regularization strength in `Lambda`.

Determine how well the models generalize by plotting the averages of the 5-fold classification error for each regularization strength. Identify the regularization strength that minimizes the generalization error over the grid.

```ce = kfoldLoss(CVMdl); figure; plot(log10(Lambda),log10(ce)) [~,minCEIdx] = min(ce); minLambda = Lambda(minCEIdx); hold on plot(log10(minLambda),log10(ce(minCEIdx)),'ro'); ylabel('log_{10} 5-fold classification error') xlabel('log_{10} Lambda') legend('MSE','Min classification error') hold off```

Train an ECOC model composed of linear classification model using the entire data set, and specify the minimal regularization strength.

```t = templateLinear('Learner','logistic','Solver','sparsa',... 'Regularization','lasso','Lambda',minLambda,'GradientTolerance',1e-8); MdlFinal = fitcecoc(X,Y,'Learners',t,'ObservationsIn','columns');```

To estimate labels for new observations, pass `MdlFinal` and the new data to `predict`.

Introduced in R2016a