# estimate

Fit univariate ARIMA or ARIMAX model to data

## Syntax

``EstMdl = estimate(Mdl,y)``
``[EstMdl,EstParamCov,logL,info] = estimate(___)``
``EstMdl = estimate(Mdl,Tbl1)``
``````[EstMdl,EstParamCov,logL,info] = estimate(Mdl,Tbl1)``````
``[___] = estimate(___,Name,Value)``

## Description

example

````EstMdl = estimate(Mdl,y)` returns the fully specified ARIMA model `EstMdl`. This model stores the estimated parameter values resulting from fitting the partially specified ARIMA model `Mdl` to the observed univariate time series `y` by using maximum likelihood. `EstMdl` and `Mdl` are the same model type and have the same structure.```

example

````[EstMdl,EstParamCov,logL,info] = estimate(___)` also returns the estimated variance-covariance matrix associated with estimated parameters `EstParamCov`, the optimized loglikelihood objective function `logL`, and a data structure of summary information `info`.```

example

````EstMdl = estimate(Mdl,Tbl1)` fits the partially specified ARIMA model `Mdl` to response variable in the input table or timetable `Tbl1`, which contains time series data, and returns the fully specified, estimated ARIMA model `EstMdl`. `estimate` selects the response variable named in `Mdl.SeriesName` or the sole variable in `Tbl1`. To select a different response variable in `Tbl1` to fit the model to, use the `ResponseVariable` name-value argument. (since R2023b)```
``````[EstMdl,EstParamCov,logL,info] = estimate(Mdl,Tbl1)``` also returns the estimated variance-covariance matrix associated with estimated parameters `EstParamCov`, the optimized loglikelihood objective function `logL`, and a data structure of summary information `info`. (since R2023b)```

example

````[___] = estimate(___,Name,Value)` specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. `estimate` returns the output argument combination for the corresponding input arguments. For example, `estimate(Mdl,y,Y0=y0,X=Pred)` fits the ARIMA model `Mdl` to the vector of response data `y`, specifies the vector of presample response data `y0`, and includes a linear regression term in the model for the exogenous predictor data `Pred`.Supply all input data using the same data type. Specifically: If you specify the numeric vector `y`, optional data sets must be numeric arrays and you must use the appropriate name-value argument. For example, to specify a presample, set the `Y0` name-value argument to a numeric matrix of presample data.If you specify the table or timetable `Tbl1`, optional data sets must be tables or timetables, respectively, and you must use the appropriate name-value argument. For example, to specify a presample, set the `Presample` name-value argument to a table or timetable of presample data. ```

## Examples

collapse all

Fit an ARMA(2,1) model to simulated data.

Simulate Data from Known Model

Suppose that the data generating process (DGP) is

`${y}_{t}=0.5{y}_{t-1}-0.3{y}_{t-2}+{\epsilon }_{t}+0.2{\epsilon }_{t-1},$`

where ${\epsilon }_{t}$ is a series of iid Gaussian random variables with mean 0 and variance 0.1.

Create the ARMA(2,1) model representing the DGP.

```DGP = arima(AR={0.5,-0.3},MA=0.2,Constant=0, ... Variance=0.1)```
```DGP = arima with properties: Description: "ARIMA(2,0,1) Model (Gaussian Distribution)" SeriesName: "Y" Distribution: Name = "Gaussian" P: 2 D: 0 Q: 1 Constant: 0 AR: {0.5 -0.3} at lags [1 2] SAR: {} MA: {0.2} at lag [1] SMA: {} Seasonality: 0 Beta: [1×0] Variance: 0.1 ```

`DGP` is a fully specified `arima` model object.

Simulate a random 500 observation path from the ARMA(2,1) model.

```rng(5,"twister"); % For reproducibility T = 500; y = simulate(DGP,T);```

y is a 500-by-1 column vector representing a simulated response path from the ARMA(2,1) model `DGP`.

Estimate Model

Create an ARMA(2,1) model template for estimation.

`Mdl = arima(2,0,1)`
```Mdl = arima with properties: Description: "ARIMA(2,0,1) Model (Gaussian Distribution)" SeriesName: "Y" Distribution: Name = "Gaussian" P: 2 D: 0 Q: 1 Constant: NaN AR: {NaN NaN} at lags [1 2] SAR: {} MA: {NaN} at lag [1] SMA: {} Seasonality: 0 Beta: [1×0] Variance: NaN ```

`Mdl` is a partially specified `arima` model object. Only required, nonestimable parameters that determine the model structure are specified. `NaN`-valued properties, including ${\varphi }_{1}$, ${\varphi }_{2}$, ${\theta }_{1}$, $\mathit{c}$, and ${\sigma }^{2}$, are unknown model parameters to be estimated.

Fit the ARMA(2,1) model to `y`.

`EstMdl = estimate(Mdl,y)`
``` ARIMA(2,0,1) Model (Gaussian Distribution): Value StandardError TStatistic PValue _________ _____________ __________ __________ Constant 0.0089018 0.018417 0.48334 0.62886 AR{1} 0.49563 0.10323 4.8013 1.5767e-06 AR{2} -0.25495 0.070155 -3.6341 0.00027897 MA{1} 0.27737 0.10732 2.5846 0.0097491 Variance 0.10004 0.0066577 15.027 4.9017e-51 ```
```EstMdl = arima with properties: Description: "ARIMA(2,0,1) Model (Gaussian Distribution)" SeriesName: "Y" Distribution: Name = "Gaussian" P: 2 D: 0 Q: 1 Constant: 0.00890178 AR: {0.495632 -0.254951} at lags [1 2] SAR: {} MA: {0.27737} at lag [1] SMA: {} Seasonality: 0 Beta: [1×0] Variance: 0.100043 ```

`MATLAB®` displays a table containing an estimation summary, which includes parameter estimates and inferences. For example, the `Value` column contains corresponding maximum-likelihood estimates, and the `PValue` column contains $\mathit{p}$-values for the asymptotic $\mathit{t}$-test of the null hypothesis that the corresponding parameter is 0.

`EstMdl` is a fully specified, estimated `arima` model object; its estimates resemble the parameter values of the DGP.

Fit an AR(2) model to simulated data while holding the model constant fixed during estimation.

Simulate Data from Known Model

Suppose the DGP is

`${y}_{t}=0.5{y}_{t-1}-0.3{y}_{t-2}+{\epsilon }_{t},$`

where ${\epsilon }_{t}$ is a series of iid Gaussian random variables with mean 0 and variance 0.1.

Create the AR(2) model representing the DGP.

`DGP = arima(AR={0.5,-0.3},Constant=0,Variance=0.1);`

Simulate a random 500 observation path from the model.

```rng(5,"twister"); % For reproducibility T = 500; y = simulate(DGP,T);```

Create Model Object Specifying Constraint

Assume that the mean of ${\mathit{y}}_{\mathit{t}}$ is 0, which implies that $\mathit{c}$ is 0.

Create an AR(2) model for estimation. Set $\mathit{c}$ to 0.

`Mdl = arima(ARLags=1:2,Constant=0)`
```Mdl = arima with properties: Description: "ARIMA(2,0,0) Model (Gaussian Distribution)" SeriesName: "Y" Distribution: Name = "Gaussian" P: 2 D: 0 Q: 0 Constant: 0 AR: {NaN NaN} at lags [1 2] SAR: {} MA: {} SMA: {} Seasonality: 0 Beta: [1×0] Variance: NaN ```

`Mdl` is a partially specified `arima` model object. Specified parameters include all required parameters and the model constant. `NaN`-valued properties, including ${\varphi }_{1}$, ${\varphi }_{2}$, and ${\sigma }^{2}$, are unknown model parameters to be estimated.

Estimate Model

Fit the AR(2) model template containing the constraint to `y`.

`EstMdl = estimate(Mdl,y)`
``` ARIMA(2,0,0) Model (Gaussian Distribution): Value StandardError TStatistic PValue ________ _____________ __________ __________ Constant 0 0 NaN NaN AR{1} 0.56342 0.044225 12.74 3.5474e-37 AR{2} -0.29355 0.041786 -7.0252 2.137e-12 Variance 0.10022 0.006644 15.085 2.0476e-51 ```
```EstMdl = arima with properties: Description: "ARIMA(2,0,0) Model (Gaussian Distribution)" SeriesName: "Y" Distribution: Name = "Gaussian" P: 2 D: 0 Q: 0 Constant: 0 AR: {0.563425 -0.293554} at lags [1 2] SAR: {} MA: {} SMA: {} Seasonality: 0 Beta: [1×0] Variance: 0.100222 ```

`EstMdl` is a fully specified, estimated `arima` model object; its estimates resemble the parameter values of the AR(2) model `DGP`. The value of $\mathit{c}$ in the estimation summary and object display is `0`, and corresponding inferences are trivial or do not apply.

Load the US equity index data set `Data_EquityIdx`.

`load Data_EquityIdx`

The table `DataTable` includes the time series variable `NYSE`, which contains daily NYSE composite closing prices from January 1990 through December 2001.

Convert the table to a timetable.

```dt = datetime(dates,'ConvertFrom','datenum','Format','yyyy-MM-dd'); TT = table2timetable(DataTable,'RowTimes',dt);```

Suppose that an ARIMA(1,1,1) model is appropriate to model NYSE composite series during the sample period

Fit an ARIMA(1,1,1) model to the data, and return the estimated parameter covariance matrix.

```Mdl = arima(1,1,1); [EstMdl,EstParamCov] = estimate(Mdl,TT{:,"NYSE"});```
``` ARIMA(1,1,1) Model (Gaussian Distribution): Value StandardError TStatistic PValue ________ _____________ __________ ________ Constant 0.15745 0.09783 1.6094 0.10752 AR{1} -0.21995 0.15642 -1.4062 0.15968 MA{1} 0.28539 0.15382 1.8554 0.063544 Variance 17.159 0.20038 85.632 0 ```
`EstParamCov`
```EstParamCov = 4×4 0.0096 -0.0002 0.0002 0.0023 -0.0002 0.0245 -0.0240 -0.0060 0.0002 -0.0240 0.0237 0.0057 0.0023 -0.0060 0.0057 0.0402 ```

`EstMdl` is a fully specified, estimated `arima` model object. Rows and columns of `EstParamCov` correspond to the rows in the table of estimates and inferences; for example, $\underset{}{\overset{ˆ}{Cov}}\left({\underset{}{\overset{ˆ}{\varphi }}}_{1},{\underset{}{\overset{ˆ}{\theta }}}_{1}\right)=-0.024$.

Compute estimated parameter standard errors by taking the square root of the diagonal elements of the covariance matrix.

`estParamSE = sqrt(diag(EstParamCov))`
```estParamSE = 4×1 0.0978 0.1564 0.1538 0.2004 ```

Compute a Wald-based 95% confidence interval on $\varphi$.

```T = size(TT,1); % Effective sample size phihat = EstMdl.AR{1}; sephihat = estParamSE(2); ciphi = phihat + tinv([0.025 0.975],T - 3)*sephihat```
```ciphi = 1×2 -0.5266 0.0867 ```

The interval contains 0, which suggests that $\varphi$ is insignificant.

Since R2023b

Fit an ARIMA(1,1,1) model to the weekly average NYSE closing prices. Supply a timetable of data and specify the series for the fit.

Load the US equity index data set `Data_EquityIdx`.

```load Data_EquityIdx T = height(DataTimeTable)```
```T = 3028 ```

The timetable `DataTimeTable` includes the time series variable `NYSE`, which contains daily NYSE composite closing prices from January 1990 through December 2001.

Plot the daily NYSE price series.

```figure plot(DataTimeTable.Time,DataTimeTable.NYSE) title("NYSE Daily Closing Prices: 1990 - 2001")```

Prepare Timetable for Estimation

When you plan to supply a timetable, you must ensure it has all the following characteristics:

• The selected response variable is numeric and does not contain any missing values.

• The timestamps in the `Time` variable are regular, and they are ascending or descending.

Remove all missing values from the timetable, relative to the NYSE price series.

```DTT = rmmissing(DataTimeTable,DataVariables="NYSE"); T_DTT = height(DTT)```
```T_DTT = 3028 ```

Because all sample times have observed NYSE prices, `rmmissing` does not remove any observations.

Determine whether the sampling timestamps have a regular frequency and are sorted.

`areTimestampsRegular = isregular(DTT,"days")`
```areTimestampsRegular = logical 0 ```
`areTimestampsSorted = issorted(DTT.Time)`
```areTimestampsSorted = logical 1 ```

`areTimestampsRegular = 0` indicates that the timestamps of `DTT` are irregular. `areTimestampsSorted = 1` indicates that the timestamps are sorted. Business day rules make daily macroeconomic measurements irregular.

Remedy the time irregularity by computing the weekly average closing price series of all timetable variables.

```DTTW = convert2weekly(DTT,Aggregation="mean"); areTimestampsRegular = isregular(DTTW,"weeks")```
```areTimestampsRegular = logical 1 ```
`T_DTTW = height(DTTW)`
```T_DTTW = 627 ```

`DTTW` is regular.

```figure plot(DTTW.Time,DTTW.NYSE) title("NYSE Daily Closing Prices: 1990 - 2001")```

Create Model Template for Estimation

Suppose that an ARIMA(1,1,1) model is appropriate to model NYSE composite series during the sample period.

Create an ARIMA(1,1,1) model template for estimation.

`Mdl = arima(1,1,1)`
```Mdl = arima with properties: Description: "ARIMA(1,1,1) Model (Gaussian Distribution)" SeriesName: "Y" Distribution: Name = "Gaussian" P: 2 D: 1 Q: 1 Constant: NaN AR: {NaN} at lag [1] SAR: {} MA: {NaN} at lag [1] SMA: {} Seasonality: 0 Beta: [1×0] Variance: NaN ```

`Mdl` is a partially specified `arima` model object.

Fit Model to Data

Fit an ARIMA(1,1,1) model to weekly average NYSE closing prices. Specify the entire series and the response variable name.

`EstMdl = estimate(Mdl,DTTW,ResponseVariable="NYSE");`
``` ARIMA(1,1,1) Model (Gaussian Distribution): Value StandardError TStatistic PValue ________ _____________ __________ ___________ Constant 0.86386 0.46496 1.8579 0.06318 AR{1} -0.37582 0.22719 -1.6542 0.09809 MA{1} 0.47221 0.21741 2.172 0.029858 Variance 55.89 1.832 30.507 2.1199e-204 ```

`EstMdl` is a fully specified, estimated `arima` model object. By default, `estimate` backcasts for the required `Mdl.P = 2` presample responses.

Since R2023b

Because an ARIMA model is a function of previous values, `estimate` requires presample data to initialize the model early in the sampling period. Although, `estimate` backcasts for presample data by default, you can specify required presample data instead. The `P` property of an `arima` model object specifies the required number of presample observations.

Fit an ARIMA(1,1,1) model to the weekly average NYSE closing prices. Supply timetables of presample and estimation data sets.

Load the US equity index data set `Data_EquityIdx`.

`load Data_EquityIdx`

Prepare Timetable for Estimation

The daily price series are irregular because observations occur only on business days. Remedy the time irregularity by computing the weekly average closing price series of all timetable variables.

`DTTW = convert2weekly(DataTimeTable,Aggregation="mean");`

Create Model Template for Estimation

Suppose that an ARIMA(1,1,1) model is appropriate to model NYSE composite series during the sample period.

Create an ARIMA(1,1,1) model template for estimation.

`Mdl = arima(1,1,1)`
```Mdl = arima with properties: Description: "ARIMA(1,1,1) Model (Gaussian Distribution)" SeriesName: "Y" Distribution: Name = "Gaussian" P: 2 D: 1 Q: 1 Constant: NaN AR: {NaN} at lag [1] SAR: {} MA: {NaN} at lag [1] SMA: {} Seasonality: 0 Beta: [1×0] Variance: NaN ```

`Mdl.P` is `2`. Therefore, `estimate` requires 2 presample observations to initialize the model for estimation.

Partition Sample

Partition the entire sample `DTTW` into presample and estimation sample timetables. The presample occurs first and contains 2 observations and the estimation sample contains the remaining observations in `DTTW`.

```PS = DTTW(1:Mdl.P,:); ES = DTTW((Mdl.P+1):end,:);```

Estimate Model

Fit an ARIMA(1,1,1) model to the estimation sample. Specify the presample sample and response variable names.

```EstMdl = estimate(Mdl,ES,ResponseVariable="NYSE", ... Presample=PS,PresampleResponseVariable="NYSE");```
``` ARIMA(1,1,1) Model (Gaussian Distribution): Value StandardError TStatistic PValue ________ _____________ __________ ___________ Constant 0.83624 0.453 1.846 0.064891 AR{1} -0.32862 0.23526 -1.3968 0.16246 MA{1} 0.42703 0.22613 1.8885 0.058965 Variance 56.065 1.8433 30.416 3.3809e-203 ```

Since R2023b

Fit an ARIMA(1,1,1) model to the weekly average NYSE closing prices. Specify initial parameter values obtained from an analysis of a pilot sample.

Load the US equity index data set `Data_EquityIdx`.

`load Data_EquityIdx`

Prepare Timetable for Estimation

The daily price series are irregular because observations occur only on business days. Remedy the time irregularity by computing the weekly average closing price series of all timetable variables.

`DTTW = convert2weekly(DataTimeTable,Aggregation="mean");`

Create Model Template for Estimation

Suppose that an ARIMA(1,1,1) model is appropriate to model NYSE composite series during the sample period.

Create an ARIMA(1,1,1) model template for estimation. Specify the response series name as `NYSE`.

`Mdl = arima(ARLags=1,D=1,MALags=1,SeriesName="NYSE");`

Fit Model to Pilot Sample

Treat the first two years as a pilot sample for obtaining initial parameter values when fitting the model to the remaining three years of data. Fit the model to the pilot sample. By default, `estimate` uses the response data in the table variable that matches `Mdl.SeriesName`.

```endPilot = datetime(1991,12,31); DTTW0 = DTTW(DTTW.Time <= endPilot,:); EstMdl0 = estimate(Mdl,DTTW0,Display="off");```

`EstMdl0` is a fully specified, estimated `arima` model object.

Estimate Model

Fit an ARIMA(1,1,1) model to the estimation sample. Specify the estimated parameters from the pilot sample fit as initial values for optimization.

```DTTWEst = DTTW(DTTW.Time > endPilot,:); c0 = EstMdl0.Constant; ar0 = EstMdl0.AR; ma0 = EstMdl0.MA; var0 = EstMdl0.Variance; EstMdl = estimate(Mdl,DTTWEst,Constant0=c0,AR0=ar0, ... MA0=ma0,Variance0=var0);```
``` ARIMA(1,1,1) Model (Gaussian Distribution): Value StandardError TStatistic PValue ________ _____________ __________ ___________ Constant 0.93922 0.55503 1.6922 0.090609 AR{1} -0.38996 0.26259 -1.4851 0.13753 MA{1} 0.48477 0.25108 1.9308 0.053513 Variance 64.661 2.4853 26.018 3.1308e-149 ```

Fit an ARIMAX model to simulated time series data.

Simulate Predictor and Response Data

Create the ARIMAX(2,1,0) model for the DGP, represented by ${\mathit{y}}_{\mathit{t}}$ in the equation

`$\left(1-0.5L+0.3{L}^{2}\right)\left(1-L{\right)}^{1}{y}_{t}=2+1.5{x}_{1,t}+2.6{x}_{2,t}-0.3{x}_{3,t}+{\epsilon }_{t},$`

where ${\epsilon }_{t}$ is a series of iid Gaussian random variables with mean 0 and variance 0.1.

```DGP = arima(AR={0.5,-0.3},D=1,Constant=2, ... Variance=0.1,Beta=[1.5 2.6 -0.3]);```

Assume that the exogenous variables ${\mathit{x}}_{1,\mathit{t}}$, ${\mathit{x}}_{2,\mathit{t}}$, and ${\mathit{x}}_{3,\mathit{t}}$ are represented by the AR(1) processes

`$\begin{array}{c}{x}_{1,t}=0.1{x}_{1,t-1}+{\eta }_{1,t}\\ {x}_{2,t}=0.2{x}_{2,t-1}+{\eta }_{2,t}\\ {x}_{3,t}=0.3{x}_{3,t-1}+{\eta }_{3,t},\end{array}$`

where ${\eta }_{i,t}$ follows a Gaussian distribution with mean 0 and variance 0.01 for $\mathit{i}\in \left\{1,2,3\right\}$. Create ARIMA models that represent the exogenous variables.

```MdlX1 = arima(AR=0.1,Constant=0,Variance=0.01); MdlX2 = arima(AR=0.2,Constant=0,Variance=0.01); MdlX3 = arima(AR=0.3,Constant=0,Variance=0.01);```

Simulate length 1000 exogenous series from the AR models. Store the simulated data in a matrix.

```T = 1000; rng(10,"twister"); % For reproducibility x1 = simulate(MdlX1,T); x2 = simulate(MdlX2,T); x3 = simulate(MdlX3,T); X = [x1 x2 x3];```

`X` is a 1000-by-3 matrix of simulated time series data. Each row corresponds to an observation in the time series, and each column corresponds to an exogenous variable.

Simulate a length 1000 series from the DGP. Specify the simulated exogenous data.

`y = simulate(DGP,T,X=X);`

`y` is a 1000-by-1 vector of response data.

Estimate Model

Create an ARIMA(2,1,0) model template for estimation.

`Mdl = arima(2,1,0)`
```Mdl = arima with properties: Description: "ARIMA(2,1,0) Model (Gaussian Distribution)" SeriesName: "Y" Distribution: Name = "Gaussian" P: 3 D: 1 Q: 0 Constant: NaN AR: {NaN NaN} at lags [1 2] SAR: {} MA: {} SMA: {} Seasonality: 0 Beta: [1×0] Variance: NaN ```

The model description (`Description` property) and value of `Beta` suggest that the partially specified `arima` model object `Mdl` is agnostic of the exogenous predictors.

Estimate the ARIMAX(2,1,0) model; specify the exogenous predictor data. Because `estimate` backcasts for presample responses (a process that requires presample predictor data for ARIMAX models), fit the model to the latest `T – Mdl.P` responses. (Alternatively, you can specify presample responses by using the `Y0` name-value argument.)

`EstMdl = estimate(Mdl,y((Mdl.P + 1):T),X=X);`
``` ARIMAX(2,1,0) Model (Gaussian Distribution): Value StandardError TStatistic PValue ________ _____________ __________ ___________ Constant 1.7519 0.021143 82.859 0 AR{1} 0.56076 0.016511 33.963 7.9385e-253 AR{2} -0.26625 0.015966 -16.676 1.963e-62 Beta(1) 1.4764 0.10157 14.536 7.1228e-48 Beta(2) 2.5638 0.10445 24.547 4.6637e-133 Beta(3) -0.34422 0.098623 -3.4903 0.00048249 Variance 0.10673 0.0047273 22.577 7.3156e-113 ```

`EstMdl` is a fully specified, estimated `arima` model object.

When you estimate the model by using `estimate` and supply the exogenous data by specifying the `X` name-value argument, MATLAB® recognizes the model as an ARIMAX(2,1,0) model and includes a linear regression component for the exogenous variables.

The estimated model is

`$\left(1-0.56\mathit{L}+0.27{\mathit{L}}^{2}\right){\left(1-\mathit{L}\right)}^{1}{\mathit{y}}_{\mathit{t}}=1.75+{1.48x}_{1,\mathit{t}}+2.56{x}_{2,\mathit{t}}-0.34{x}_{3,\mathit{t}}+{\epsilon }_{\mathit{t}},$`

which resembles the DGP represented by `Mdl0`. Because MATLAB returns the AR coefficients of the model expressed in difference-equation notation, their signs are opposite in the equation.

Since R2023b

Fit an ARIMA(1,1,1) model to the weekly average NYSE closing prices. Compute estimated weekly averages closing price within the time range of the data.

Load the US equity index data set `Data_EquityIdx`.

`load Data_EquityIdx`

The daily price series are irregular because observations occur only on business days. Remedy the time irregularity by computing the weekly average closing price series of all timetable variables.

```DTTW = convert2weekly(DataTimeTable,Aggregation="mean"); numobs = height(DTTW)```
```numobs = 627 ```

Suppose that an ARIMA(1,1,1) model is appropriate to model NYSE composite series during the sample period.

Create an ARIMA(1,1,1) model template for estimation. Specify the response series name as `NYSE`.

```Mdl = arima(1,1,1); Mdl.SeriesName = "NYSE";```

Fit an ARIMA(1,1,1) model to the entire sample. Suppress the estimation display.

`EstMdl = estimate(Mdl,DTTW,Display="off");`

Infer residuals ${\mathit{e}}_{\mathit{t}}$ from the estimated model, specify the required presample.

```ResidTT = infer(EstMdl,DTTW); tail(ResidTT)```
``` Time NYSE NASDAQ NYSE_Residual NYSE_Variance ___________ ______ ______ _____________ _____________ 16-Nov-2001 577.11 1886.9 5.8562 55.89 23-Nov-2001 583 1898.3 5.4409 55.89 30-Nov-2001 581.41 1925.8 -2.8105 55.89 07-Dec-2001 584.96 1998.1 3.4212 55.89 14-Dec-2001 574.03 1981 -12.071 55.89 21-Dec-2001 582.1 1967.9 8.7933 55.89 28-Dec-2001 590.28 1967.2 6.2015 55.89 04-Jan-2002 589.8 1950.4 -1.2004 55.89 ```

`ResidTT` is a 627-by-4 timetable containing the data passed to `esimtate` `DTTW`, and the residuals `NYSE_Residual` and estimated conditional variances `NYSE_Variance` from the fit. Because the model variance is a constant, the conditional variance variable contains a vector completely composed of `55.89`, which is the model variance estimate.

Compute the fitted values $\stackrel{ˆ}{\text{\hspace{0.17em}}{\mathit{y}}_{\mathit{t}}}$, and store them in `ResidTT`.

```ResidTT.NYSE_YHat = ResidTT.NYSE - ResidTT.NYSE_Residual; tail(ResidTT)```
``` Time NYSE NASDAQ NYSE_Residual NYSE_Variance NYSE_YHat ___________ ______ ______ _____________ _____________ _________ 16-Nov-2001 577.11 1886.9 5.8562 55.89 571.25 23-Nov-2001 583 1898.3 5.4409 55.89 577.56 30-Nov-2001 581.41 1925.8 -2.8105 55.89 584.22 07-Dec-2001 584.96 1998.1 3.4212 55.89 581.54 14-Dec-2001 574.03 1981 -12.071 55.89 586.1 21-Dec-2001 582.1 1967.9 8.7933 55.89 573.3 28-Dec-2001 590.28 1967.2 6.2015 55.89 584.08 04-Jan-2002 589.8 1950.4 -1.2004 55.89 591 ```

Plot the last 200 observations with corresponding fitted values on the same graph.

```figure h = plot(ResidTT.Time((end-199):end),ResidTT{(end-199):end,["NYSE" "NYSE_YHat"]}); h(2).LineStyle = "--"; legend(["Observations" "Fitted values"]) title("Model of NYSE Weekly Average Closing Prices")```

The fitted values closely track the observations.

Plot the residuals versus the fitted values.

```figure plot(ResidTT.NYSE_YHat,ResidTT.NYSE_Residual,".",MarkerSize=15) ylabel("Residuals") xlabel("Fitted Values") title("Residual Plot")```

Residual variance appears larger for larger fitted values. One remedy for this behavior is to apply the log transform to the data.

## Input Arguments

collapse all

Partially specified ARIMA model used to indicate constrained and estimable model parameters, specified as an `arima` model object returned by `arima`. Properties of `Mdl` describe the model structure and can specify parameter values.

`estimate` fits unspecified (`NaN`-valued) parameters to the data `y`.

`estimate` treats specified parameters as equality constraints during estimation.

Single path of observed response data yt, to which the model `Mdl` is fit, specified as a `numobs`-by-1 numeric column vector. The last observation of `y` is the latest observation.

`y` is the continuation of the presample series `Y0`.

Data Types: `double`

Since R2023b

Time series data, to which `estimate` fits the model, specified as a table or timetable with `numvars` variables and `numobs` rows.

The selected response variable is a numeric vector representing a single path of `numobs` observations. You can optionally select a response variable yt from `Tbl1` by using the `ResponseVariables` name-value argument, and you can select `numpreds` predictor variables xt for the exogenous regression component by using the `PredictorVariables` name-value argument.

Each row is an observation, and measurements in each row occur simultaneously. Variables in `Tbl1` represent the continuation of corresponding variables in `Presample`.

If `Tbl1` is a timetable, it must represent a sample with a regular datetime time step (see `isregular`), and the datetime vector `Tbl1.Time` must be strictly ascending or descending.

If `Tbl1` is a table, the last row contains the latest observation.

### Name-Value Arguments

Specify optional pairs of arguments as `Name1=Value1,...,NameN=ValueN`, where `Name` is the argument name and `Value` is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose `Name` in quotes.

Example: `esimtate(Mdl,y,Y0=y0,X=Pred)` uses the vector `y0` as presample responses for estimation and includes a linear regression component for the exogenous predictor data in the vector `Pred`.

Estimation Options

collapse all

Since R2023b

Response variable yt to select from `Tbl1` containing the response data, specified as one of the following data types:

• String scalar or character vector containing a variable name in `Tbl1.Properties.VariableNames`

• Variable index (integer) to select from `Tbl1.Properties.VariableNames`

• A length `numvars` logical vector, where ```ResponseVariable(j) = true``` selects variable `j` from `Tbl1.Properties.VariableNames`, and `sum(ResponseVariable)` is `1`

The selected variable must be a numeric vector and cannot contain missing values (`NaN`).

If `Tbl1` has one variable, the default specifies that variable. Otherwise, the default matches the variable to name in `Mdl.SeriesName`.

Example: `ResponseVariable="StockRate2"`

Example: `ResponseVariable=[false false true false]` or `ResponseVariable=3` selects the third table variable as the response variable.

Data Types: `double` | `logical` | `char` | `cell` | `string`

Exogenous predictor data for the linear regression component, specified as a numeric matrix containing `numpreds` columns. Use `X` only when you supply a vector of response data `y`.

`numpreds` is the number of predictor variables.

Rows correspond to observations, and the last row contains the latest observation. `estimate` does not use the regression component in the presample period. `X` must have at least as many observations as are used after the presample period:

• If you specify `Y0`, `X` must have at least `numobs` rows.

• Otherwise, `X` must have at least `numobs` + `Mdl.P` observations to account for the presample removal.

In either case, if you supply more rows than necessary, `estimate` uses the latest observations only.

`estimate` synchronizes `X` and `y` so that the latest observations (last rows) occur simultaneously.

Columns correspond to individual predictor variables.

By default, `estimate` excludes the regression component, regardless of its presence in `Mdl`.

Data Types: `double`

Since R2023b

Exogenous predictor variables xt to select from `Tbl1` containing predictor data for the regression component, specified as one of the following data types:

• String vector or cell vector of character vectors containing `numpreds` variable names in `Tbl1.Properties.VariableNames`

• A length `numpreds` vector of unique indices (positive integers) of variables to select from `Tbl1.Properties.VariableNames`

• A length `numvars` logical vector, where `PredictorVariables(j) = true ` selects variable `j` from `Tbl1.Properties.VariableNames`, and `sum(PredictorVariables)` is `numpreds`

The selected variables must be numeric vectors and cannot contain missing values (`NaN`).

If you specify `PredictorVariables`, you must also specify presample response data to by using the `Presample` and `PresampleResponseVariable` name-value arguments. For more details, see Algorithms.

By default, `estimate` excludes the regression component, regardless of its presence in `Mdl`.

Example: ```PredictorVariables=["M1SL" "TB3MS" "UNRATE"]```

Example: `PredictorVariables=[true false true false]` or `PredictorVariable=[1 3]` selects the first and third table variables to supply the predictor data.

Data Types: `double` | `logical` | `char` | `cell` | `string`

Optimization options, specified as an `optimoptions` optimization controller. For details on modifying the default values of the optimizer, see `optimoptions` or `fmincon` in Optimization Toolbox™.

For example, to change the constraint tolerance to `1e-6`, set ```options = optimoptions(@fmincon,ConstraintTolerance=1e-6,Algorithm="sqp")```. Then, pass `Options` into `estimate` using `Options=options`.

By default, `estimate` uses the same default options as `fmincon`, except `Algorithm` is `"sqp"` and `ConstraintTolerance` is `1e-7`.

Command Window display option, specified as one or more of the values in this table.

ValueInformation Displayed
`"diagnostics"`Optimization diagnostics
`"full"`Maximum likelihood parameter estimates, standard errors, t statistics, iterative optimization information, and optimization diagnostics
`"iter"`Iterative optimization information
`"off"`None
`"params"`Maximum likelihood parameter estimates, standard errors, and t statistics and p-values of coefficient significance tests

Example: `Display="off"` is well suited for running a simulation that estimates many models.

Example: `Display=["params" "diagnostics"]` displays all estimation results and the optimization diagnostics.

Data Types: `char` | `cell` | `string`

Presample Specifications

collapse all

Presample response data yt to initialize the model, specified as a `numpreobs`-by-1 numeric column vector. Use `Y0` only when you supply the vector of response data `y`.

`numpreobs` is the number of presample observations. Each row is a presample observation. The last row contains the latest presample observation. `numpreobs` must be at least `Mdl.P`. If `numpreobs` > `Mdl.P`, `estimate` uses the latest required number of observations only. The last element or row contains the latest observation.

By default, `estimate` backward forecasts (backcasts) for the necessary amount of presample responses.

For details on partitioning data for estimation, see Time Base Partitions for ARIMA Model Estimation.

Data Types: `double`

Presample residual data et to initialize the model, specified as a `numpreobs`-by-1 numeric column vector. Use `E0` only when you supply the vector of response data `y`.

`numpreobs` is the number of presample observations. Each row is a presample observation. The last row contains the latest presample observation. `numpreobs` must be at least `Mdl.Q`. If `numpreobs` > `Mdl.Q`, `estimate` uses the latest required number of observations only. The last element or row contains the latest observation.

If `Mdl.Variance` is a conditional variance model object, such as a `garch` model, `estimate` can require more than `Mdl.Q` presample innovations.

By default, `estimate` sets all required presample residuals to `0`, which is the expected value of the corresponding innovations series.

Data Types: `double`

Presample conditional variances σ2t to initialize any conditional variance model, `numpreobs`-by-1 positive column vector. If `Mdl.Variance` is a conditional variance model, `V0` provides initial values for that model. Use `V0` only when you supply the vector of response data `y`.

Each row is a presample observation. `numpreobs` must be at least number of observations required to initialize the conditional variance model type in `Mdl.Variance` (see `estimate`). If `V0` has extra rows, `estimate` uses only the latest observations. The last row contains the latest presample observation.

If the variance is constant, `estimate` ignores `V0`.

By default, `estimate` sets the necessary presample conditional variances to the average squared value of the inferred residuals.

Data Types: `double`

Since R2023b

Presample data containing the response yt, residual et, or conditional variance σt2 series to initialize the model for estimation, specified as a table or timetable, the same type as `Tbl1`, with `numprevars` variables and `numpreobs` rows. Use `Presample` only when you supply a table or timetable of data `Tbl1`.

Each selected variable is a single path of `numpreobs` observations representing the presample of responses, residuals, or conditional variances for the selected response variable in `Tbl1`.

Each row is a presample observation, and measurements in each row occur simultaneously. `numpreobs` must satisfy one of the following conditions:

• `numpreobs``Mdl.P` when `Presample` provides only presample responses

• `numpreobs``Mdl.Q` when `Presample` provides only presample residuals

• `numpreobs``max([Mdl.P Mdl.Q])` when `Presample` provides presample responses and residuals.

• `Mdl` can require more presample observations then specified in the other conditions when `Presample` provides presample conditional variances. For more details, see `estimate`.

If you supply more rows than necessary, `estimate` uses the latest required number of observations only.

When

If `Presample` is a timetable, all the following conditions must be true:

• `Presample` must represent a sample with a regular datetime time step (see `isregular`).

• The inputs `Tbl1` and `Presample` must be consistent in time such that `Presample` immediately precedes `Tbl1` with respect to the sampling frequency and order.

• The datetime vector of sample timestamps `Presample.Time` must be ascending or descending.

If `Presample` is a table, the last row contains the latest presample observation.

By default:

• When `Mdl` is an ARIMA model without an exogenous linear regression component, `estimate` backcasts for necessary presample responses, sets necessary presample residuals to 0, and sets necessary presample variances to the average squared value of inferred residuals.

• When `Mdl` is an ARIMAX model (you specify the `PredictorVariables` name-value argument), you must specify presample response data because `estimate` cannot backcast for presample responses. `estimate` sets necessary presample residuals to 0 and necessary presample variances to the average squared value of inferred residuals.

If you specify the `Presample`, you must specify the presample response, innovation, and conditional variance variable names by using the `PresampleResponseVariable`, `PresampleInnovationVariable`, or `PresampleVarianceVariable` name-value argument, respectively.

Since R2023b

Response variable yt to select from `Presample` containing presample response data, specified as one of the following data types:

• String scalar or character vector containing the variable name to select from `Presample.Properties.VariableNames`

• Variable index (positive integer) to select from `Presample.Properties.VariableNames`

• A logical vector, where ```PresampleResponseVariable(j) = true``` selects variable `j` from `Presample.Properties.VariableNames`

The selected variable must be a numeric vector and cannot contain missing values (`NaN`s).

If you specify presample response data by using the `Presample` name-value argument, you must specify `PresampleResponseVariable`.

Example: `PresampleResponseVariable="GDP"`

Example: `PresampleResponseVariable=[false false true false]` or `PresampleResponseVariable=3` selects the third table variable for presample response data.

Data Types: `double` | `logical` | `char` | `cell` | `string`

Since R2023b

Residual variable et to select from `Presample` containing presample residual data, specified as one of the following data types:

• String scalar or character vector containing the variable name to select from `Presample.Properties.VariableNames`

• Variable index (positive integer) to select from `Presample.Properties.VariableNames`

• A logical vector, where ```PresampleInnovationVariable(j) = true``` selects variable `j` from `Presample.Properties.VariableNames`

The selected variable must be a numeric vector and cannot contain missing values (`NaN`s).

If you specify presample residual data by using the `Presample` name-value argument, you must specify `PresampleInnovationVariable`.

Example: `PresampleInnovationVariable="GDPInnov"`

Example: `PresampleInnovationVariable=[false false true false]` or `PresampleInnovationVariable=3` selects the third table variable for presample residual data.

Data Types: `double` | `logical` | `char` | `cell` | `string`

Since R2023b

Conditional variance variable σt2 to select from of `Presample` containing presample conditional variance data, specified as one of the following data types:

• String scalar or character vector containing a variable name in `Presample.Properties.VariableNames`

• Variable index (positive integer) to select from `Presample.Properties.VariableNames`

• A logical vector, where ```PresampleVarianceVariable(j) = true``` selects variable `j` from `Presample.Properties.VariableNames`

The selected variable must be a numeric vector and cannot contain missing values (`NaN`s).

If you specify presample conditional variance data by using the `Presample` name-value argument, you must specify `PresampleVarianceVariable`.

Example: `PresampleVarianceVariable="StockRateVar0"`

Example: `PresampleVarianceVariable=[false false true false]` or `PresampleVarianceVariable=3` selects the third table variable as the presample conditional variance variable.

Data Types: `double` | `logical` | `char` | `cell` | `string`

Initial Parameter Value Specifications

collapse all

Initial estimate of the model constant c, specified as a numeric scalar.

By default, `estimate` derives initial estimates using standard time series techniques.

Data Types: `double`

Initial estimates of the nonseasonal AR polynomial coefficients $\varphi \left(L\right)$, specified as a numeric vector.

Elements of `AR0` correspond to nonzero cells of `Mdl.AR`.

By default, `estimate` derives initial estimates using standard time series techniques.

Data Types: `double`

Initial estimates of the seasonal autoregressive polynomial coefficients $\Phi \left(L\right)$, specified as a numeric vector.

Elements of `SAR0` correspond to nonzero cells of `Mdl.SAR`.

By default, `estimate` derives initial estimates using standard time series techniques.

Data Types: `double`

Initial estimates of the nonseasonal moving average polynomial coefficients $\theta \left(L\right)$, specified as a numeric vector.

Elements of `MA0` correspond to elements of `Mdl.MA`.

By default, `estimate` derives initial estimates using standard time series techniques.

Data Types: `double`

Initial estimates of the seasonal moving average polynomial coefficients $\Theta \left(L\right)$, specified as a numeric vector.

Elements of `SMA0` correspond to nonzero cells of `Mdl.SMA`.

By default, `estimate` derives initial estimates using standard time series techniques.

Data Types: `double`

Initial estimates of the regression coefficients β, specified as a numeric vector.

The length of `Beta0` must equal the `numpreds`. Elements of `Beta0` correspond to the predictor variables represented by the columns of `X` or `PredictorVariables`.

By default, `estimate` derives initial estimates using standard time series techniques.

Data Types: `double`

Initial estimate of the t-distribution degrees-of-freedom parameter ν, specified as a positive scalar. `DoF0` must exceed 2.

Data Types: `double`

Initial estimates of variances of innovations, specified as a positive scalar or a cell vector of name-value arguments.

`Mdl.Variance` ValueDescription`'Variance0'` Value
Numeric scalar or `NaN`Constant variancePositive scalar
`garch`, `egarch`, or `gjr` model objectConditional variance modelCell vector of name-value arguments for specifying initial estimates, see the `estimate` function of the conditional variance model objects. The cell vector must have the form `{'Name1',value1,'Name2',value2,...}`.

By default, `estimate` derives initial estimates using standard time series techniques.

Example: For a model with a constant variance, set `Variance0=2` to specify an initial variance estimate of `2`.

Example: For a composite conditional mean and variance model, set `Variance0={'Constant0',2,'ARCH0',0.1}` to specify an initial estimate of `2` for the conditional variance model constant, and an initial estimate of `0.1` for the lag 1 coefficient in the ARCH polynomial.

Data Types: `double` | `cell`

Note

• `NaN` values in `y`, `X`, `Y0`, `E0`, and `V0` indicate missing values. `estimate` removes missing values from specified data by listwise deletion.

• For the presample, `estimate` horizontally concatenates `Y0`, `E0`, and `V0`, and then it removes any row of the concatenated matrix containing at least one `NaN`.

• For the estimation sample, `estimate` horizontally concatenates `y` and `X`, and then it removes any row of the concatenated matrix containing at least one `NaN`.

• Regardless of sample, `estimate` synchronizes the specified, possibly jagged vectors with respect to the latest observation of the sample (last row).

This type of data reduction reduces the effective sample size and can create an irregular time series.

• `estimate` issues an error when any table or timetable input contains missing values.

## Output Arguments

collapse all

Estimated ARIMA model, returned as an `arima` model object.

`EstMdl` is a copy of `Mdl` that has `NaN` values replaced with parameter estimates. `EstMdl` is fully specified.

Estimated covariance matrix of maximum likelihood estimates known to the optimizer, returned as a positive semidefinite numeric matrix.

The rows and columns contain the covariances of the parameter estimates. The standard error of each parameter estimate is the square root of the main diagonal entries.

The rows and columns corresponding to any parameters held fixed as equality constraints are zero vectors.

Parameters corresponding to the rows and columns of `EstParamCov` appear in the following order:

• Constant

• Nonzero `AR` coefficients at positive lags, from the smallest to largest lag

• Nonzero `SAR` coefficients at positive lags, from the smallest to largest lag

• Nonzero `MA` coefficients at positive lags, from the smallest to largest lag

• Nonzero `SMA` coefficients at positive lags, from the smallest to largest lag

• Regression coefficients (when you specify exogenous data), ordered by the columns of `X` or entries of `PredictorVariables`

• Variance parameters, a scalar for constant variance models and vector for conditional variance models (see `estimate` for the order of parameters)

• Degrees of freedom (t-innovation distribution only)

Data Types: `double`

Optimized loglikelihood objective function value, returned as a numeric scalar.

Data Types: `double`

Optimization summary, returned as a structure array with the fields described in this table.

FieldDescription
`exitflag`Optimization exit flag (see `fmincon` in Optimization Toolbox)
`options`Optimization options controller (see `optimoptions` and `fmincon` in Optimization Toolbox)
`X`Vector of final parameter estimates
`X0`Vector of initial parameter estimates

For example, you can display the vector of final estimates by entering `info.X` in the Command Window.

Data Types: `struct`

## Algorithms

• `estimate` infers innovations and conditional variances (when present) of the underlying response series, and then uses constrained maximum likelihood to fit the model `Mdl` to the response data `y`.

• Because you can specify numeric presample data inputs `Y0`, `E0`, and `V0` of differing lengths, `estimate` assumes that all specified sets have these characteristics:

• The final observation (row) in each set occurs simultaneously.

• The first observation in the estimation sample immediately follows the last observation in the presample, with respect to the sampling frequency.

• If you specify the `Display` name-value argument, the value overrides the `Diagnostics` and `Display` settings of the `Options` name-value argument. Otherwise, `estimate` displays optimization information using `Options` settings.

• `estimate` uses the outer product of gradients (OPG) method to perform covariance matrix estimation.

• If you supply data in the table or timetable `Tbl1` to estimate an ARIMAX model, `estimate` cannot backcast for presample responses. Therefore, if you specify `PredictorVariables`, you must also specify presample response data by using the `Presample` and `PresampleResponseVariable` name-value arguments.

## References

[1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.

[2] Enders, Walter. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, Inc., 1995.

[3] Greene, William. H. Econometric Analysis. 6th ed. Upper Saddle River, NJ: Prentice Hall, 2008.

[4] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.

## Version History

Introduced in R2012a

expand all