Main Content

Interactive stepwise regression

`stepwise`

stepwise(X,y)

stepwise(X,y,inmodel,penter,premove)

`stepwise`

uses the sample
data in `hald.mat`

to display a graphical user interface
for performing stepwise regression of the response values in `heat`

on
the predictive terms in `ingredients`

.

The upper left of the interface displays estimates of the coefficients for all potential terms, with horizontal bars indicating 90% (colored) and 95% (grey) confidence intervals. The red color indicates that, initially, the terms are not in the model. Values displayed in the table are those that would result if the terms were added to the model.

The middle portion of the interface displays summary statistics for the entire model. These statistics are updated with each step.

The lower portion of the interface, **Model History**,
displays the RMSE for the model. The plot tracks the RMSE from step
to step, so you can compare the optimality of different models. Hover
over the blue dots in the history to see which terms were in the model
at a particular step. Click on a blue dot in the history to open a
copy of the interface initialized with the terms in the model at that
step.

Initial models, as well as entrance/exit tolerances for the *p*-values
of *F*-statistics, are specified using additional
input arguments to `stepwise`

. Defaults are an
initial model with no terms, an entrance tolerance of 0.05, and an
exit tolerance of 0.10.

To center and scale the input data (compute *z*-scores)
to improve conditioning of the underlying least-squares problem, select ```
Scale
Inputs
```

from the **Stepwise** menu.

You proceed through a stepwise regression in one of two ways:

Click

**Next Step**to select the recommended next step. The recommended next step either adds the most significant term or removes the least significant term. When the regression reaches a local minimum of RMSE, the recommended next step is “Move no terms.” You can perform all of the recommended steps at once by clicking**All Steps**.Click a line in the plot or in the table to toggle the state of the corresponding term. Clicking a red line, corresponding to a term not currently in the model, adds the term to the model and changes the line to blue. Clicking a blue line, corresponding to a term currently in the model, removes the term from the model and changes the line to red.

To call `addedvarplot`

and
produce an added variable plot from the `stepwise`

interface,
select **Added Variable Plot** from the **Stepwise** menu.
A list of terms is displayed. Select the term you want to add, and
then click **OK**.

Click **Export** to display a dialog
box that allows you to select information from the interface to save
to the MATLAB^{®} workspace. Check the information you want to export
and, optionally, change the names of the workspace variables to be
created. Click **OK** to export the information.

`stepwise(X,y)`

displays the
interface using the *p* predictive terms in the *n*-by-*p* matrix `X`

and
the response values in the *n*-by-1 vector `y`

.
Distinct predictive terms should appear in different columns of `X`

.

**Note**

`stepwise`

automatically includes a constant
term in all models. Do not enter a column of 1s directly into `X`

.

`stepwise`

treats `NaN`

values
in either `X`

or `y`

as missing
values, and ignores them.

`stepwise(X,y,inmodel,penter,premove)`

additionally
specifies the initial model (`inmodel`

) and the
entrance (`penter`

) and exit (`premove`

)
tolerances for the *p*-values of *F*-statistics. `inmodel`

is
either a logical vector with length equal to the number of columns
of `X`

, or a vector of indices, with values ranging
from 1 to the number of columns in `X`

. The value
of `penter`

must be less than or equal to the value
of `premove`

.

*Stepwise regression* is a systematic method
for adding and removing terms from a multilinear model based on their
statistical significance in a regression. The method begins with an
initial model and then compares the explanatory power of incrementally
larger and smaller models. At each step, the *p* value
of an *F*-statistic is computed to test models with
and without a potential term. If a term is not currently in the model,
the null hypothesis is that the term would have a zero coefficient
if added to the model. If there is sufficient evidence to reject the
null hypothesis, the term is added to the model. Conversely, if a
term is currently in the model, the null hypothesis is that the term
has a zero coefficient. If there is insufficient evidence to reject
the null hypothesis, the term is removed from the model. The method
proceeds as follows:

Fit the initial model.

If any terms not in the model have

*p*-values less than an entrance tolerance (that is, if it is unlikely that they would have zero coefficient if added to the model), add the one with the smallest*p*value and repeat this step; otherwise, go to step 3.If any terms in the model have

*p*-values greater than an exit tolerance (that is, if it is unlikely that the hypothesis of a zero coefficient can be rejected), remove the one with the largest*p*value and go to step 2; otherwise, end.

Depending on the terms included in the initial model and the order in which terms are moved in and out, the method may build different models from the same set of potential terms. The method terminates when no single step improves the model. There is no guarantee, however, that a different initial model or a different sequence of steps will not lead to a better fit. In this sense, stepwise models are locally optimal, but may not be globally optimal.