# ClassificationLinear class

Linear model for binary classification of high-dimensional data

## Description

`ClassificationLinear` is a trained linear model object for binary classification; the linear model is a support vector machine (SVM) or logistic regression model. `fitclinear` fits a `ClassificationLinear` model by minimizing the objective function using techniques that reduce computation time for high-dimensional data sets (e.g., stochastic gradient descent). The classification loss plus the regularization term compose the objective function.

Unlike other classification models, and for economical memory usage, `ClassificationLinear` model objects do not store the training data. However, they do store, for example, the estimated linear model coefficients, prior-class probabilities, and the regularization strength.

You can use trained `ClassificationLinear` models to predict labels or classification scores for new data. For details, see `predict`.

## Construction

Create a `ClassificationLinear` object by using `fitclinear`.

## Properties

expand all

#### Linear Classification Properties

Regularization term strength, specified as a nonnegative scalar or vector of nonnegative values.

Data Types: `double` | `single`

Linear classification model type, specified as `'logistic'` or `'svm'`.

In this table, $f\left(x\right)=x\beta +b.$

• β is a vector of p coefficients.

• x is an observation from p predictor variables.

• b is the scalar bias.

ValueAlgorithmLoss Function`FittedLoss` Value
`'logistic'`Logistic regressionDeviance (logistic): $\ell \left[y,f\left(x\right)\right]=\mathrm{log}\left\{1+\mathrm{exp}\left[-yf\left(x\right)\right]\right\}$`'logit'`
`'svm'`Support vector machineHinge: $\ell \left[y,f\left(x\right)\right]=\mathrm{max}\left[0,1-yf\left(x\right)\right]$`'hinge'`

Linear coefficient estimates, specified as a numeric vector with length equal to the number of predictors.

Data Types: `double`

Estimated bias term or model intercept, specified as a numeric scalar.

Data Types: `double`

Loss function used to fit the linear model, specified as `'hinge'` or `'logit'`.

ValueAlgorithmLoss Function`Learner` Value
`'hinge'`Support vector machineHinge: $\ell \left[y,f\left(x\right)\right]=\mathrm{max}\left[0,1-yf\left(x\right)\right]$`'svm'`
`'logit'`Logistic regressionDeviance (logistic): $\ell \left[y,f\left(x\right)\right]=\mathrm{log}\left\{1+\mathrm{exp}\left[-yf\left(x\right)\right]\right\}$`'logistic'`

Complexity penalty type, specified as `'lasso (L1)'` or ```'ridge (L2)'```.

The software composes the objective function for minimization from the sum of the average loss function (see `FittedLoss`) and a regularization value from this table.

ValueDescription
`'lasso (L1)'`Lasso (L1) penalty: $\lambda \sum _{j=1}^{p}|{\beta }_{j}|$
`'ridge (L2)'`Ridge (L2) penalty: $\frac{\lambda }{2}\sum _{j=1}^{p}{\beta }_{j}^{2}$

λ specifies the regularization term strength (see `Lambda`).

The software excludes the bias term (β0) from the regularization penalty.

#### Other Classification Properties

Indices of categorical predictors, whose value is always empty (`[]`) because a `ClassificationLinear` model does not support categorical predictors.

Unique class labels used in training, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. `ClassNames` has the same data type as the class labels `Y`. (The software treats string arrays as cell arrays of character vectors.) `ClassNames` also determines the class order.

Data Types: `categorical` | `char` | `logical` | `single` | `double` | `cell`

Misclassification costs, specified as a square numeric matrix. `Cost` has K rows and columns, where K is the number of classes.

`Cost(i,j)` is the cost of classifying a point into class `j` if its true class is `i`. The order of the rows and columns of `Cost` corresponds to the order of the classes in `ClassNames`.

Data Types: `double`

Parameters used for training the `ClassificationLinear` model, specified as a structure.

Access fields of `ModelParameters` using dot notation. For example, access the relative tolerance on the linear coefficients and the bias term by using `Mdl.ModelParameters.BetaTolerance`.

Data Types: `struct`

Predictor names in order of their appearance in the predictor data `X`, specified as a cell array of character vectors. The length of `PredictorNames` is equal to the number of columns in `X`.

Data Types: `cell`

Expanded predictor names, specified as a cell array of character vectors.

Because a `ClassificationLinear` model does not support categorical predictors, `ExpandedPredictorNames` and `PredictorNames` are equal.

Data Types: `cell`

Prior class probabilities, specified as a numeric vector. `Prior` has as many elements as classes in `ClassNames`, and the order of the elements corresponds to the elements of `ClassNames`.

Data Types: `double`

Score transformation function to apply to predicted scores, specified as a function name or function handle.

For linear classification models and before transformation, the predicted classification score for the observation x (row vector) is f(x) = xβ + b, where β and b correspond to `Mdl.Beta` and `Mdl.Bias`, respectively.

To change the score transformation function to, for example, `function`, use dot notation.

• For a built-in function, enter this code and replace `function` with a value in the table.

`Mdl.ScoreTransform = 'function';`

ValueDescription
`'doublelogit'`1/(1 + e–2x)
`'invlogit'`log(x / (1 – x))
`'ismax'`Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0
`'logit'`1/(1 + ex)
`'none'` or `'identity'`x (no transformation)
`'sign'`–1 for x < 0
0 for x = 0
1 for x > 0
`'symmetric'`2x – 1
`'symmetricismax'`Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1
`'symmetriclogit'`2/(1 + ex) – 1

• For a MATLAB® function, or a function that you define, enter its function handle.

`Mdl.ScoreTransform = @function;`

`function` must accept a matrix of the original scores for each class, and then return a matrix of the same size representing the transformed scores for each class.

Data Types: `char` | `function_handle`

Response variable name, specified as a character vector.

Data Types: `char`

## Methods

 edge Classification edge for linear classification models loss Classification loss for linear classification models margin Classification margins for linear classification models predict Predict labels for linear classification models selectModels Choose subset of regularized, binary linear classification models

## Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects (MATLAB).

## Examples

collapse all

Train a binary, linear classification model using support vector machines, dual SGD, and ridge regularization.

`load nlpdata`

`X` is a sparse matrix of predictor data, and `Y` is a categorical vector of class labels. There are more than two classes in the data.

Identify the labels that correspond to the Statistics and Machine Learning Toolbox™ documentation web pages.

`Ystats = Y == 'stats';`

Train a binary, linear classification model that can identify whether the word counts in a documentation web page are from the Statistics and Machine Learning Toolbox™ documentation. Train the model using the entire data set. Determine how well the optimization algorithm fit the model to the data by extracting a fit summary.

```rng(1); % For reproducibility [Mdl,FitInfo] = fitclinear(X,Ystats)```
```Mdl = ClassificationLinear ResponseName: 'Y' ClassNames: [0 1] ScoreTransform: 'none' Beta: [34023x1 double] Bias: -1.0059 Lambda: 3.1674e-05 Learner: 'svm' Properties, Methods ```
```FitInfo = struct with fields: Lambda: 3.1674e-05 Objective: 5.3783e-04 PassLimit: 10 NumPasses: 10 BatchLimit: [] NumIterations: 238561 GradientNorm: NaN GradientTolerance: 0 RelativeChangeInBeta: 0.0562 BetaTolerance: 1.0000e-04 DeltaGradient: 1.4582 DeltaGradientTolerance: 1 TerminationCode: 0 TerminationStatus: {'Iteration limit exceeded.'} Alpha: [31572x1 double] History: [] FitTime: 0.1603 Solver: {'dual'} ```

`Mdl` is a `ClassificationLinear` model. You can pass `Mdl` and the training or new data to `loss` to inspect the in-sample classification error. Or, you can pass `Mdl` and new predictor data to `predict` to predict class labels for new observations.

`FitInfo` is a structure array containing, among other things, the termination status (`TerminationStatus`) and how long the solver took to fit the model to the data (`FitTime`). It is good practice to use `FitInfo` to determine whether optimization-termination measurements are satisfactory. Because training time is small, you can try to retrain the model, but increase the number of passes through the data. This can improve measures like `DeltaGradient`.

```load nlpdata n = size(X,1); % Number of observations```

Identify the labels that correspond to the Statistics and Machine Learning Toolbox™ documentation web pages.

`Ystats = Y == 'stats';`

Hold out 5% of the data.

```rng(1); % For reproducibility cvp = cvpartition(n,'Holdout',0.05)```
```cvp = Hold-out cross validation partition NumObservations: 31572 NumTestSets: 1 TrainSize: 29994 TestSize: 1578 ```

`cvp` is a `CVPartition` object that defines the random partition of n data into training and test sets.

Train a binary, linear classification model using the training set that can identify whether the word counts in a documentation web page are from the Statistics and Machine Learning Toolbox™ documentation. For faster training time, orient the predictor data matrix so that the observations are in columns.

```idxTrain = training(cvp); % Extract training set indices X = X'; Mdl = fitclinear(X(:,idxTrain),Ystats(idxTrain),'ObservationsIn','columns');```

Predict observations and classification error for the hold out sample.

```idxTest = test(cvp); % Extract test set indices labels = predict(Mdl,X(:,idxTest),'ObservationsIn','columns'); L = loss(Mdl,X(:,idxTest),Ystats(idxTest),'ObservationsIn','columns')```
```L = 7.1753e-04 ```

`Mdl` misclassifies fewer than 1% of the out-of-sample observations.