Esta página aún no se ha traducido para esta versión. Puede ver la versión más reciente de esta página en inglés.

You can use the Regression Learner app to automatically train a selection of different models on your data. Use automated training to quickly try a selection of model types, and then explore promising models interactively. To get started, try these options first

Get Started Regression Model Buttons | Description |
---|---|

| Try the All Quick-To-Train button first.
The app trains all model types that are typically quick to train. |

| Use the All button to train all available
model types. Trains every type regardless of any prior trained models.
Can be time-consuming. |

To learn more about automated model training, see Automated Regression Model Training.

If you want to explore models one at a time, or if you already
know what model type you want, you can select individual models or
train a group of the same type. To see all available regression model
options, on the **Regression Learner** tab, click
the arrow in the **Model Type** section to expand
the list of regression models. The options in the gallery are preset
starting points with different settings, suitable for a range of different
regression problems.

For help choosing the best model type for your problem, see the tables showing typical characteristics of different regression model types. Decide on the tradeoff you want in speed, flexibility, and interpretability. The best model type depends on your data.

To avoid overfitting, look for a less flexible model that provides sufficient accuracy. For example, look for simple models such as regression trees that are fast and easy to interpret. If the models are not accurate enough predicting the response, choose other models with higher flexibility, such as ensembles. To control flexibility, see the details for each model type.

**Characteristics of Regression Model Types**

Regression Model Type | Interpretability |
---|---|

| Easy |

| Easy |

| Easy for linear SVMs. Hard for other kernels. |

Gaussian Process Regression Models | Hard |

| Hard |

To read a description of each model in Regression Learner, switch to the details view in the list of all model presets.

The options in the **Model Type** gallery are
preset starting points with different settings. After you choose a
model type, such as regression trees, try training all the presets
to see which one produces the best model with your data.

For workflow instructions, see Train Regression Models in Regression Learner App.

In Regression Learner, all model types support categorical predictors.

If you have categorical predictors with many unique values, training linear models with interaction or quadratic terms and stepwise linear models can use a lot of memory. If the model fails to train, try removing these categorical predictors.

Linear regression models have predictors that are linear in the model parameters, are easy to interpret, and are fast for making predictions. These characteristics make linear regression models popular models to try first. However, the highly constrained form of these models means that they often have low predictive accuracy. After fitting a linear regression model, try creating more flexible models, such as regression trees, and compare the results.

In the **Model Type** gallery, click **All
Linear** to try each of the linear
regression options and see which settings produce the best model with
your data. Select the best model in the History list and try to improve
that model by using feature selection and changing some advanced options.

Regression Model Type | Interpretability | Model Flexibility |
---|---|---|

| Easy | Very low |

| Easy | Medium |

| Easy | Very low. Less sensitive to outliers, but can be slow to train. |

| Easy | Medium |

For a workflow example, see Train Regression Trees Using Regression Learner App.

Regression Learner uses the `fitlm`

function
to train Linear, Interactions Linear, and Robust Linear models. The
app uses the `stepwiselm`

function to train Stepwise
Linear models.

For Linear, Interactions Linear, and Robust Linear models you can set these options:

**Terms**Specify which terms to use in the linear model. You can choose from:

`Linear`

. A constant term and linear terms in the predictors`Interactions`

. A constant term, linear terms, and interaction terms between the predictors`Pure Quadratic`

. A constant term, linear terms, and terms that are purely quadratic in each of the predictors`Quadratic`

. A constant term, linear terms, and quadratic terms (including interactions)

**Robust option**Specify whether to use a robust objective function and make your model less sensitive to outliers. With this option, the fitting method automatically assigns lower weights to data points that are more likely to be outliers.

Stepwise linear regression starts with an initial model and systematically adds and removes terms to the model based on the explanatory power of these incrementally larger and smaller models. For Stepwise Linear models, you can set these options:

**Initial terms**Specify the terms that are included in the initial model of the stepwise procedure. You can choose from

`Constant`

,`Linear`

,`Interactions`

,`Pure Quadratic`

, and`Quadratic`

.**Upper bound on terms**Specify the highest order of the terms that the stepwise procedure can add to the model. You can choose from

`Linear`

,`Interactions`

,`Pure Quadratic`

, and`Quadratic`

.**Maximum number of steps**Specify the maximum number of different linear models that can be tried in the stepwise procedure. To speed up training, try reducing the maximum number of steps. Selecting a small maximum number of steps decreases your chances of finding a good model.

If you have categorical predictors with many unique values, training linear models with interaction or quadratic terms and stepwise linear models can use a lot of memory. If the model fails to train, try removing these categorical predictors.

Regression trees are easy to interpret, fast for fitting and
prediction, and low on memory usage. Try to grow smaller trees with
fewer larger leaves to prevent overfitting. Control the leaf size
with the **Minimum leaf size** setting.

In the **Model Type** gallery, click **All
Trees** to try each of the regression
tree options and see which settings produce the best model with your
data. Select the best model in the History list, and try to improve
that model by using feature selection and changing some advanced options.

Regression Model Type | Interpretability | Model Flexibility |
---|---|---|

| Easy | High Many small leaves for a highly flexible response function (Minimum leaf size is 4.) |

| Easy | Medium Medium-sized leaves for a less flexible response function (Minimum leaf size is 12.) |

| Easy | Low Few large leaves for a coarse response function (Minimum leaf size is 36.) |

To predict a response of a regression tree, follow the tree from the root (beginning) node down to a leaf node. The leaf node contains the value of the response.

Statistics and Machine Learning Toolbox™ trees are binary. Each step in a prediction involves checking the value of one predictor variable. For example, here is a simple regression tree

This tree predicts the response based on two predictors, `x1`

and `x2`

.
To make a prediction, start at the top node. At each node, check the
values of the predictors to decide which branch to follow. When the
branches reach a leaf node, the response is set to the value corresponding
to that node.

You can visualize your regression tree model by exporting the model from the app, and then entering:

view(trainedModel.RegressionTree,'Mode','graph')

For a workflow example, see Train Regression Trees Using Regression Learner App.

The Regression Learner app uses the `fitrtree`

function
to train regression trees. You can set these options:

**Minimum leaf size**Specify the minimum number of training samples used to calculate the response of each leaf node. When you grow a regression tree, consider its simplicity and predictive power. To change the minimum leaf size, click the buttons or enter a positive integer value in the

**Minimum leaf size**box.A fine tree with many small leaves is usually highly accurate on the training data. However, the tree might not show comparable accuracy on an independent test set. A very leafy tree tends to overfit, and its validation accuracy is often far lower than its training (or resubstitution) accuracy.

In contrast, a coarse tree with fewer large leaves does not attain high training accuracy. But a coarse tree can be more robust in that its training accuracy can be near that of a representative test set.

### Sugerencia

Decrease the

**Minimum leaf size**to create a more flexible model.**Surrogate decision splits**— For missing data only.Specify surrogate use for decision splits. If you have data with missing values, use surrogate splits to improve the accuracy of predictions.

When you set

**Surrogate decision splits**to`On`

, the regression tree finds at most 10 surrogate splits at each branch node. To change the number of surrogate splits, click the buttons or enter a positive integer value in the**Maximum surrogates per node**box.When you set

**Surrogate decision splits**to`Find All`

, the regression tree finds all surrogate splits at each branch node. The`Find All`

setting can use considerable time and memory.

You can train regression support vector machines (SVMs) in Regression Learner. Linear SVMs are easy to interpret, but can have low predictive accuracy. Nonlinear SVMs are more difficult to interpret, but can be more accurate.

In the **Model Type** gallery, click **All
SVMs** to try each of the SVM
options and see which settings produce the best model with your data.
Select the best model in the History list, and try to improve that
model by using feature selection and changing some advanced options.

Regression Model Type | Interpretability | Model Flexibility |
---|---|---|

| Easy | Low |

| Hard | Medium |

| Hard | Medium |

| Hard | High Allows rapid variations in the response
function. Kernel scale is set to |

| Hard | Medium Gives a less flexible response
function. Kernel scale is set to |

| Hard | Low Gives a rigid response function.
Kernel scale is set to |

Statistics and Machine
Learning Toolbox implements linear epsilon-insensitive
SVM regression. This SVM ignores prediction errors that are less than
some fixed number ε. The *support vectors* are
the data points that have errors larger than ε. The function
the SVM uses to predict new values depends only on the support vectors.
To learn more about SVM regression, see Understanding Support Vector Machine
Regression.

For a workflow example, see Train Regression Trees Using Regression Learner App.

Regression Learner uses the `fitrsvm`

function
to train SVM regression models.

You can set these options in the app:

**Kernel function**The kernel function determines the nonlinear transformation applied to the data before the SVM is trained. You can choose from:

`Gaussian`

or Radial Basis Function (RBF) kernel`Linear`

kernel, easiest to interpret`Quadratic`

kernel`Cubic`

kernel

**Box constraint mode**The box constraint controls the penalty imposed on observations with large residuals. A larger box constraint gives a more flexible model. A smaller value gives a more rigid model, less sensitive to overfitting.

When

**Box constraint mode**is set to`Auto`

, the app uses a heuristic procedure to select the box constraint.Try to fine-tune your model by specifying the box constraint manually. Set

**Box constraint mode**to`Manual`

and specify a value. Change the value by clicking the buttons or entering a positive scalar value in the**Manual box constraint**box. The app automatically preselects a reasonable value for you. Try to increase or decrease this value slightly and see if this improves your model.### Sugerencia

Increase the box constraint value to create a more flexible model.

**Epsilon mode**Prediction errors that are smaller than the epsilon (ε) value are ignored and treated as equal to zero. A smaller epsilon value gives a more flexible model.

When

**Epsilon mode**is set to`Auto`

, the app uses a heuristic procedure to select the kernel scale.Try to fine-tune your model by specifying the epsilon value manually. Set

**Epsilon mode**to`Manual`

and specify a value. Change the value by clicking the buttons or entering a positive scalar value in the**Manual epsilon**box. The app automatically preselects a reasonable value for you. Try to increase or decrease this value slightly and see if this improves your model.### Sugerencia

Decrease the epsilon value to create a more flexible model.

**Kernel scale mode**The kernel scale controls the scale of the predictors on which the kernel varies significantly. A smaller kernel scale gives a more flexible model.

When

**Kernel scale mode**is set to`Auto`

, the app uses a heuristic procedure to select the kernel scale.Try to fine-tune your model by specifying the kernel scale manually. Set

**Kernel scale mode**to`Manual`

and specify a value. Change the value by clicking the buttons or entering a positive scalar value in the**Manual kernel scale**box. The app automatically preselects a reasonable value for you. Try to increase or decrease this value slightly and see if this improves your model.### Sugerencia

Decrease the kernel scale value to create a more flexible model.

**Standardize**Standardizing the predictors transforms them so that they have mean 0 and standard deviation 1. Standardizing removes the dependence on arbitrary scales in the predictors and generally improves performance.

You can train Gaussian process regression (GPR) models in Regression Learner. GPR models are often highly accurate, but can be difficult to interpret.

In the **Model Type** gallery, click **All
GPR Models** to try each of the GPR
model options and see which settings produce the best model with your
data. Select the best model in the History list, and try to improve
that model by using feature selection and changing some advanced options.

Regression Model Type | Interpretability | Model Flexibility |
---|---|---|

| Hard | Automatic |

| Hard | Automatic |

| Hard | Automatic |

| Hard | Automatic |

In Gaussian process regression, the response is modeled using
a probability distribution over a space of functions. The flexibility
of the presets in the **Model Type** gallery is automatically
chosen to give a small training error and, simultaneously, protection
against overfitting. To learn more about Gaussian process regression,
see Gaussian Process Regression
Models.

For a workflow example, see Train Regression Trees Using Regression Learner App.

Regression Learner uses the `fitrgp`

function
to train GPR models.

You can set these options in the app:

**Basis function**The basis function specifies the form of the prior mean function of the Gaussian process regression model. You can choose from

`Zero`

,`Constant`

, and`Linear`

. Try to choose a different basis function and see if this improves your model.**Kernel function**The kernel function determines the correlation in the response as a function of the distance between the predictor values. You can choose from

`Rational Quadratic`

,`Squared Exponential`

,`Matern 5/2`

,`Matern 3/2`

, and`Exponential`

.To learn more about kernel functions, see Kernel (Covariance) Function Options.

**Use isotropic kernel**If you use an isotropic kernel, the correlation length scales are the same for all the predictors. With a nonisotropic kernel, each predictor variable has its own separate correlation length scale.

Using a nonisotropic kernel can improve the accuracy of your model, but can make the model slow to fit.

To learn more about nonisotropic kernels, see Kernel (Covariance) Function Options.

**Kernel mode**You can manually specify

*initial*values of the kernel parameters**Kernel scale**and**Signal standard deviation**. The signal standard deviation is the prior standard deviation of the response values. By default the app locally optimizes the kernel parameters starting from the initial values. To use fixed kernel parameters, clear the**Optimize numeric parameters**check box in the advanced options.When

**Kernel scale mode**is set to`Auto`

, the app uses a heuristic procedure to select the initial kernel parameters.If you set

**Kernel scale mode**to`Manual`

, you can specify the initial values. Click the buttons or enter a positive scalar value in the**Kernel scale**box and the**Signal standard deviation**box.If you clear the

**Use isotropic kernel**check box, you cannot set initial kernel parameters manually.**Sigma mode**You can specify manually the

*initial*value of the observation noise standard deviation**Sigma**. By default the app optimizes the observation noise standard deviation, starting from the initial value. To use fixed kernel parameters, clear the**Optimize numeric parameters**check box in the advanced options.When

**Sigma mode**is set to`Auto`

, the app uses a heuristic procedure to select the initial observation noise standard deviation.If you set

**Sigma mode**to`Manual`

, you can specify the initial values. Click the buttons or enter a positive scalar value in the**Sigma**box.**Standardize**Standardizing the predictors transforms them so that they have mean 0 and standard deviation 1. Standardizing removes the dependence on arbitrary scales in the predictors and generally improves performance.

**Optimize numeric parameters**With this option, the app automatically optimizes numeric parameters of the GPR model. The optimized parameters are the coefficients of the

**Basis function**, the kernel parameters**Kernel scale**and**Signal standard deviation**, and the observation noise standard deviation**Sigma**.

You can train ensembles of regression trees in Regression Learner. Ensemble models combine results from many weak learners into one high-quality ensemble model.

In the **Model Type** gallery, click **All
Ensembles** to try each of the ensemble
options and see which settings produce the best model with your data.
Select the best model in the History list, and try to improve that
model by using feature selection and changing some advanced options.

Regression Model Type | Interpretability | Ensemble Method | Model Flexibility |
---|---|---|---|

| Hard | Least-squares boosting ( | Medium to high |

| Hard | Bootstrap aggregating or bagging, with regression tree learners. | High |

For a workflow example, see Train Regression Trees Using Regression Learner App.

Regression Learner uses the `fitrensemble`

function
to train ensemble models. You can set these options:

**Minimum leaf size**Specify the minimum number of training samples used to calculate the response of each leaf node. When you grow a regression tree, consider its simplicity and predictive power. To change the minimum leaf size, click the buttons or enter a positive integer value in the

**Minimum leaf size**box.A fine tree with many small leaves is usually highly accurate on the training data. However, the tree might not show comparable accuracy on an independent test set. A very leafy tree tends to overfit, and its validation accuracy is often far lower than its training (or resubstitution) accuracy.

In contrast, a coarse tree with fewer large leaves does not attain high training accuracy. But a coarse tree can be more robust in that its training accuracy can be near that of a representative test set.

### Sugerencia

Decrease the

**Minimum leaf size**to create a more flexible model.**Number of learners**Try changing the number of learners to see if you can improve the model. Many learners can produce high accuracy, but can be time consuming to fit.

### Sugerencia

Increase the

**Number of learners**to create a more flexible model.**Learning rate**For boosted trees, specify the learning rate for shrinkage. If you set the learning rate to less than 1, the ensemble requires more learning iterations but often achieves better accuracy. 0.1 is a popular initial choice.

- Train Regression Models in Regression Learner App
- Select Data and Validation for Regression Problem
- Feature Selection and Feature Transformation Using Regression Learner App
- Assess Model Performance in Regression Learner App
- Export Regression Model to Predict New Data
- Train Regression Trees Using Regression Learner App