loss

Class: classreg.learning.regr.CompactRegressionSVM, RegressionSVM
Namespace: classreg.learning.regr

Regression error for support vector machine regression model

Syntax

L = loss(mdl,Tbl,ResponseVarName) L = loss(mdl,Tbl,Y) L = loss(mdl,X,Y) L = loss(___,Name,Value)

Description

L = loss(mdl,Tbl,ResponseVarName) returns the loss for the predictions of the support vector machine (SVM) regression model, mdl, based on the predictor data in the table Tbl and the true response values in Tbl.ResponseVarName.

L = loss(mdl,Tbl,Y) returns the loss for the predictions of the support vector machine (SVM) regression model, mdl, based on the predictor data in the table X and the true response values in the vector Y.

L = loss(mdl,X,Y) returns the loss for the predictions of the support vector machine (SVM) regression model, mdl, based on the predictor data in X and the true responses in Y.

L = loss(___,Name,Value) returns the loss with additional options specified by one or more name-value arguments, using any of the previous syntaxes. For example, you can specify the loss function or observation weights.

Input Arguments

expand all

`mdl` — SVM regression model
`RegressionSVM` model | `CompactRegressionSVM` model

SVM regression model, specified as a RegressionSVM model or CompactRegressionSVM model returned by fitrsvm or compact, respectively.

`Tbl` — Sample data
table

Sample data, specified as a table. Each row of tbl corresponds to one observation, and each column corresponds to one predictor variable. Optionally, Tbl can contain additional columns for the response variable and observation weights. Tbl must contain all of the predictors used to train mdl. Multicolumn variables and cell arrays other than cell arrays of character vectors are not allowed.

If you trained mdl using sample data contained in a table, then the input data for this method must also be in a table.

Data Types: table

`ResponseVarName` — Response variable name
name of a variable in `Tbl`

Response variable name, specified as the name of a variable in Tbl. The response variable must be a numeric vector.

You must specify ResponseVarName as a character vector or string scalar. For example, if the response variable Y is stored as Tbl.Y, then specify ResponseVarName as 'Y'. Otherwise, the software treats all columns of Tbl, including Y, as predictors when training the model.

Data Types: char | string

`X` — Predictor data
numeric matrix

Predictor data, specified as a numeric matrix or table. Each row of X corresponds to one observation (also known as an instance or example), and each column corresponds to one variable (also known as a feature).

If you trained mdl using a matrix of predictor values, then X must be a numeric matrix with p columns. p is the number of predictors used to train mdl.

The length of Y and the number of rows of X must be equal.

Data Types: single | double

`Y` — Observed response values
vector of numeric values

Observed response values, specified as a vector of length n containing numeric values. Each entry in Y is the observed response based on the predictor data in the corresponding row of X.

Data Types: single | double

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

`LossFun` — Loss function
`'mse'` (default) | `'epsiloninsensitive'` | function handle

Loss function, specified as the comma-separated pair consisting of 'LossFun' and 'mse', 'epsiloninsensitive', or a function handle.

The following table lists the available loss functions.
Value Loss Function
'mse' Weighted Mean Squared Error
'epsiloninsensitive' Epsilon-Insensitive Loss Function
Specify your own function using function handle notation.
Your function must have the signature lossvalue = lossfun(Y,Yfit,W), where:
- The output argument lossvalue is a scalar value.
- You choose the function name (lossfun).
- Y is an n-by-1 numeric vector of observed response values.
- Yfit is an n-by-1 numeric vector of predicted response values, calculated using the corresponding predictor values in X (similar to the output of predict).
- W is an n-by-1 numeric vector of observation weights. If you pass W, the software normalizes them to sum to 1.
Specify your function using 'LossFun',@lossfun.

Value	Loss Function
`'mse'`	Weighted Mean Squared Error
`'epsiloninsensitive'`	Epsilon-Insensitive Loss Function

Example: 'LossFun','epsiloninsensitive'

Data Types: char | string | function_handle

`PredictionForMissingValue` — Predicted response value to use for observations with missing predictor values
`"median"` (default) | `"mean"` | `"omitted"` | numeric scalar

Since R2023b

Predicted response value to use for observations with missing predictor values, specified as "median", "mean", "omitted", or a numeric scalar.

Value	Description
`"median"`	`loss` uses the median of the observed response values in the training data as the predicted response value for observations with missing predictor values.
`"mean"`	`loss` uses the mean of the observed response values in the training data as the predicted response value for observations with missing predictor values.
`"omitted"`	`loss` excludes observations with missing predictor values from the loss computation.
Numeric scalar	`loss` uses this value as the predicted response value for observations with missing predictor values.

If an observation is missing an observed response value or an observation weight, then loss does not use the observation in the loss computation.

Example: PredictionForMissingValue="omitted"

Data Types: single | double | char | string

`Weights` — Observation weights
`ones(size(X,1),1)` (default) | numeric vector

Observation weights, specified as the comma-separated pair consisting of 'Weights' and a numeric vector. Weights must be the same length as the number of rows in X. The software weighs the observations in each row of X using the corresponding weight value in Weights.

Weights are normalized to sum to 1.

Data Types: single | double

Output Arguments

expand all

`L` — Regression loss
scalar value

Regression loss, returned as a scalar value.

Examples

expand all

Calculate Test Sample Loss for SVM Regression Model

Open Live Script

Calculate the test set mean squared error (MSE) and epsilon-insensitive error of an SVM regression model.

Load the carsmall sample data. Specify Horsepower and Weight as the predictor variables (X), and MPG as the response variable (Y).

load carsmall
X = [Horsepower,Weight];
Y = MPG;

Delete rows of X and Y where either array has NaN values.

R = rmmissing([X Y]);
X = R(:,1:2);
Y = R(:,end);

Reserve 10% of the observations as a holdout sample, and extract the training and test indices.

rng default  % For reproducibility
N = length(Y);
cv = cvpartition(N,'HoldOut',0.10);
trainInds = training(cv);
testInds = test(cv);

Specify the training and test data sets.

XTrain = X(trainInds,:);
YTrain = Y(trainInds);
XTest = X(testInds,:);
YTest = Y(testInds);

Train a linear SVM regression model and standardize the data.

mdl = fitrsvm(XTrain,YTrain,'Standardize',true)

mdl = 
  RegressionSVM
             ResponseName: 'Y'
    CategoricalPredictors: []
        ResponseTransform: 'none'
                    Alpha: [68x1 double]
                     Bias: 23.0248
         KernelParameters: [1x1 struct]
                       Mu: [108.8810 2.9419e+03]
                    Sigma: [44.4943 805.1412]
          NumObservations: 84
           BoxConstraints: [84x1 double]
          ConvergenceInfo: [1x1 struct]
          IsSupportVector: [84x1 logical]
                   Solver: 'SMO'

mdl is a RegressionSVM model.

Determine how well the trained model generalizes to new predictor values by estimating the test sample mean squared error and epsilon-insensitive error.

lossMSE = loss(mdl,XTest,YTest)

lossMSE = 
32.0268

lossEI = loss(mdl,XTest,YTest,'LossFun','epsiloninsensitive')

lossEI = 
3.2919

More About

expand all

Weighted Mean Squared Error

The weighted mean squared error is calculated as follows:

$mse = \frac{\sum_{j = 1}^{n} w_{j} {(f (x_{j}) - y_{j})}^{2}}{\sum_{j = 1}^{n} w_{j}},$

where:

n is the number of rows of data
x_j is the jth row of data
y_j is the true response to x_j
f(x_j) is the response prediction of the SVM regression model mdl to x_j
w is the vector of weights.

The weights in w are all equal to one by default. You can specify different values for weights using the 'Weights' name-value pair argument. If you specify weights, each value is divided by the sum of all weights, such that the normalized weights add to one.

Epsilon-Insensitive Loss Function

The epsilon-insensitive loss function ignores errors that are within the distance epsilon (ε) of the function value. It is formally described as:

$L o s s_{ε} = {\begin{matrix} 0, i f | y - f (x) | \leq ε \\ | y - f (x) | - ε, o t h e r w i s e . \end{matrix}$

The mean epsilon-insensitive loss is calculated as follows:

$L o s s = \frac{\sum_{j = 1}^{n} w_{j} \max (0, | y_{j} - f (x_{j}) | - ε)}{\sum_{j = 1}^{n} w_{j}},$

where:

n is the number of rows of data
x_j is the jth row of data
y_j is the true response to x_j
f(x_j) is the response prediction of the SVM regression model mdl to x_j
w is the vector of weights.

Tips

If mdl is a cross-validated RegressionPartitionedSVM model, use kfoldLoss instead of loss to calculate the regression error.

Extended Capabilities

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

The loss function fully supports tall arrays. For more information, see Tall Arrays.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. (since R2023a)

This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2015b

expand all

R2023b: Specify predicted response value to use for observations with missing predictor values

Starting in R2023b, when you predict or compute the loss, some regression models allow you to specify the predicted response value for observations with missing predictor values. Specify the PredictionForMissingValue name-value argument to use a numeric scalar, the training set median, or the training set mean as the predicted value. When computing the loss, you can also specify to omit observations with missing predictor values.

This table lists the object functions that support the PredictionForMissingValue name-value argument. By default, the functions use the training set median as the predicted response value for observations with missing predictor values.

Model Type	Model Objects	Object Functions
Gaussian process regression (GPR) model	`RegressionGP`, `CompactRegressionGP`	`loss`, `predict`, `resubLoss`, `resubPredict`
Gaussian process regression (GPR) model	`RegressionPartitionedGP`	`kfoldLoss`, `kfoldPredict`
Gaussian kernel regression model	`RegressionKernel`	`loss`, `predict`
Gaussian kernel regression model	`RegressionPartitionedKernel`	`kfoldLoss`, `kfoldPredict`
Linear regression model	`RegressionLinear`	`loss`, `predict`
Linear regression model	`RegressionPartitionedLinear`	`kfoldLoss`, `kfoldPredict`
Neural network regression model	`RegressionNeuralNetwork`, `CompactRegressionNeuralNetwork`	`loss`, `predict`, `resubLoss`, `resubPredict`
Neural network regression model	`RegressionPartitionedNeuralNetwork`	`kfoldLoss`, `kfoldPredict`
Support vector machine (SVM) regression model	`RegressionSVM`, `CompactRegressionSVM`	`loss`, `predict`, `resubLoss`, `resubPredict`
Support vector machine (SVM) regression model	`RegressionPartitionedSVM`	`kfoldLoss`, `kfoldPredict`

In previous releases, the regression model loss and predict functions listed above used NaN predicted response values for observations with missing predictor values. The software omitted observations with missing predictor values from the resubstitution ("resub") and cross-validation ("kfold") computations for prediction and loss.

R2023a: GPU array support

Starting in R2023a, loss fully supports GPU arrays.

R2022a: `loss` can return NaN for predictor data with missing values

The loss function no longer omits an observation with a NaN prediction when computing the weighted average regression loss. Therefore, loss can now return NaN when the predictor data X or the predictor variables in Tbl contain any missing values. In most cases, if the test set observations do not contain missing predictors, the loss function does not return NaN.

This change improves the automatic selection of a regression model when you use fitrauto. Before this change, the software might select a model (expected to best predict the responses for new data) with few non-NaN predictors.

If loss in your code returns NaN, you can update your code to avoid this result. Remove or replace the missing values by using rmmissing or fillmissing, respectively.

The following table shows the regression models for which the loss object function might return NaN. For more details, see the Compatibility Considerations for each loss function.

Model Type	Full or Compact Model Object	`loss` Object Function
Gaussian process regression (GPR) model	`RegressionGP`, `CompactRegressionGP`	`loss`
Gaussian kernel regression model	`RegressionKernel`	`loss`
Linear regression model	`RegressionLinear`	`loss`
Neural network regression model	`RegressionNeuralNetwork`, `CompactRegressionNeuralNetwork`	`loss`
Support vector machine (SVM) regression model	`RegressionSVM`, `CompactRegressionSVM`	`loss`

loss

Syntax

Description

Input Arguments

mdl — SVM regression model RegressionSVM model | CompactRegressionSVM model

Tbl — Sample data table

ResponseVarName — Response variable name name of a variable in Tbl

X — Predictor data numeric matrix

Y — Observed response values vector of numeric values

Name-Value Arguments

LossFun — Loss function 'mse' (default) | 'epsiloninsensitive' | function handle

PredictionForMissingValue — Predicted response value to use for observations with missing predictor values "median" (default) | "mean" | "omitted" | numeric scalar

Weights — Observation weights ones(size(X,1),1) (default) | numeric vector

Output Arguments

L — Regression loss scalar value

Examples

Calculate Test Sample Loss for SVM Regression Model

More About

Weighted Mean Squared Error

Epsilon-Insensitive Loss Function

Tips

Extended Capabilities

Tall Arrays Calculate with arrays that have more rows than fit in memory.

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. (since R2023a)

Version History

R2023b: Specify predicted response value to use for observations with missing predictor values

R2023a: GPU array support

R2022a: loss can return NaN for predictor data with missing values

See Also

`mdl` — SVM regression model
`RegressionSVM` model | `CompactRegressionSVM` model

`Tbl` — Sample data
table

`ResponseVarName` — Response variable name
name of a variable in `Tbl`

`X` — Predictor data
numeric matrix

`Y` — Observed response values
vector of numeric values

`LossFun` — Loss function
`'mse'` (default) | `'epsiloninsensitive'` | function handle

`PredictionForMissingValue` — Predicted response value to use for observations with missing predictor values
`"median"` (default) | `"mean"` | `"omitted"` | numeric scalar

`Weights` — Observation weights
`ones(size(X,1),1)` (default) | numeric vector

`L` — Regression loss
scalar value

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. (since R2023a)

R2022a: `loss` can return NaN for predictor data with missing values