Main Content

estimate

Fit univariate ARIMA or ARIMAX model to data

Description

EstMdl = estimate(Mdl,y) returns the fully specified ARIMA model EstMdl. This model stores the estimated parameter values resulting from fitting the partially specified ARIMA model Mdl to the observed univariate time series y by using maximum likelihood. EstMdl and Mdl are the same model type and have the same structure.

example

[EstMdl,EstParamCov,logL,info] = estimate(___) also returns the estimated variance-covariance matrix associated with estimated parameters EstParamCov, the optimized loglikelihood objective function logL, and a data structure of summary information info.

example

EstMdl = estimate(Mdl,Tbl1) fits the partially specified ARIMA model Mdl to the response variable in the input table or timetable Tbl1, which contains time series data, and returns the fully specified, estimated ARIMA model EstMdl. estimate selects the response variable named in Mdl.SeriesName or the sole variable in Tbl1. To select a different response variable in Tbl1 to fit the model to, use the ResponseVariable name-value argument. (since R2023b)

example

[EstMdl,EstParamCov,logL,info] = estimate(Mdl,Tbl1) also returns the estimated variance-covariance matrix associated with estimated parameters EstParamCov, the optimized loglikelihood objective function logL, and a data structure of summary information info. (since R2023b)

[___] = estimate(___,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. estimate returns the output argument combination for the corresponding input arguments. For example, estimate(Mdl,y,Y0=y0,X=Pred) fits the ARIMA model Mdl to the vector of response data y, specifies the vector of presample response data y0, and includes a linear regression term in the model for the exogenous predictor data Pred.

Supply all input data using the same data type. Specifically:

  • If you specify the numeric vector y, optional data sets must be numeric arrays and you must use the appropriate name-value argument. For example, to specify a presample, set the Y0 name-value argument to a numeric matrix of presample data.

  • If you specify the table or timetable Tbl1, optional data sets must be tables or timetables, respectively, and you must use the appropriate name-value argument. For example, to specify a presample, set the Presample name-value argument to a table or timetable of presample data.

example

Examples

collapse all

Fit an ARMA(2,1) model to simulated data.

Simulate Data from Known Model

Suppose that the data generating process (DGP) is

yt=0.5yt-1-0.3yt-2+εt+0.2εt-1,

where εt is a series of iid Gaussian random variables with mean 0 and variance 0.1.

Create the ARMA(2,1) model representing the DGP.

DGP = arima(AR={0.5,-0.3},MA=0.2,Constant=0, ...
    Variance=0.1)
DGP = 
  arima with properties:

     Description: "ARIMA(2,0,1) Model (Gaussian Distribution)"
      SeriesName: "Y"
    Distribution: Name = "Gaussian"
               P: 2
               D: 0
               Q: 1
        Constant: 0
              AR: {0.5 -0.3} at lags [1 2]
             SAR: {}
              MA: {0.2} at lag [1]
             SMA: {}
     Seasonality: 0
            Beta: [1×0]
        Variance: 0.1

DGP is a fully specified arima model object.

Simulate a random 500 observation path from the ARMA(2,1) model.

rng(5,"twister"); % For reproducibility
T = 500;
y = simulate(DGP,T);

y is a 500-by-1 column vector representing a simulated response path from the ARMA(2,1) model DGP.

Estimate Model

Create an ARMA(2,1) model template for estimation.

Mdl = arima(2,0,1)
Mdl = 
  arima with properties:

     Description: "ARIMA(2,0,1) Model (Gaussian Distribution)"
      SeriesName: "Y"
    Distribution: Name = "Gaussian"
               P: 2
               D: 0
               Q: 1
        Constant: NaN
              AR: {NaN NaN} at lags [1 2]
             SAR: {}
              MA: {NaN} at lag [1]
             SMA: {}
     Seasonality: 0
            Beta: [1×0]
        Variance: NaN

Mdl is a partially specified arima model object. Only required, nonestimable parameters that determine the model structure are specified. NaN-valued properties, including ϕ1, ϕ2, θ1, c, and σ2, are unknown model parameters to be estimated.

Fit the ARMA(2,1) model to y.

EstMdl = estimate(Mdl,y)
 
    ARIMA(2,0,1) Model (Gaussian Distribution):
 
                  Value      StandardError    TStatistic      PValue  
                _________    _____________    __________    __________

    Constant    0.0089018       0.018417       0.48334         0.62886
    AR{1}         0.49563        0.10323        4.8013      1.5767e-06
    AR{2}        -0.25495       0.070155       -3.6341      0.00027897
    MA{1}         0.27737        0.10732        2.5846       0.0097491
    Variance      0.10004      0.0066577        15.027      4.9017e-51
EstMdl = 
  arima with properties:

     Description: "ARIMA(2,0,1) Model (Gaussian Distribution)"
      SeriesName: "Y"
    Distribution: Name = "Gaussian"
               P: 2
               D: 0
               Q: 1
        Constant: 0.00890178
              AR: {0.495632 -0.254951} at lags [1 2]
             SAR: {}
              MA: {0.27737} at lag [1]
             SMA: {}
     Seasonality: 0
            Beta: [1×0]
        Variance: 0.100043

MATLAB® displays a table containing an estimation summary, which includes parameter estimates and inferences. For example, the Value column contains corresponding maximum-likelihood estimates, and the PValue column contains p-values for the asymptotic t-test of the null hypothesis that the corresponding parameter is 0.

EstMdl is a fully specified, estimated arima model object; its estimates resemble the parameter values of the DGP.

Fit an AR(2) model to simulated data while holding the model constant fixed during estimation.

Simulate Data from Known Model

Suppose the DGP is

yt=0.5yt-1-0.3yt-2+εt,

where εt is a series of iid Gaussian random variables with mean 0 and variance 0.1.

Create the AR(2) model representing the DGP.

DGP = arima(AR={0.5,-0.3},Constant=0,Variance=0.1);

Simulate a random 500 observation path from the model.

rng(5,"twister"); % For reproducibility
T = 500;
y = simulate(DGP,T);

Create Model Object Specifying Constraint

Assume that the mean of yt is 0, which implies that c is 0.

Create an AR(2) model for estimation. Set c to 0.

Mdl = arima(ARLags=1:2,Constant=0)
Mdl = 
  arima with properties:

     Description: "ARIMA(2,0,0) Model (Gaussian Distribution)"
      SeriesName: "Y"
    Distribution: Name = "Gaussian"
               P: 2
               D: 0
               Q: 0
        Constant: 0
              AR: {NaN NaN} at lags [1 2]
             SAR: {}
              MA: {}
             SMA: {}
     Seasonality: 0
            Beta: [1×0]
        Variance: NaN

Mdl is a partially specified arima model object. Specified parameters include all required parameters and the model constant. NaN-valued properties, including ϕ1, ϕ2, and σ2, are unknown model parameters to be estimated.

Estimate Model

Fit the AR(2) model template containing the constraint to y.

EstMdl = estimate(Mdl,y)
 
    ARIMA(2,0,0) Model (Gaussian Distribution):
 
                 Value      StandardError    TStatistic      PValue  
                ________    _____________    __________    __________

    Constant           0             0            NaN             NaN
    AR{1}        0.56342      0.044225          12.74      3.5474e-37
    AR{2}       -0.29355      0.041786        -7.0252       2.137e-12
    Variance     0.10022      0.006644         15.085      2.0476e-51
EstMdl = 
  arima with properties:

     Description: "ARIMA(2,0,0) Model (Gaussian Distribution)"
      SeriesName: "Y"
    Distribution: Name = "Gaussian"
               P: 2
               D: 0
               Q: 0
        Constant: 0
              AR: {0.563425 -0.293554} at lags [1 2]
             SAR: {}
              MA: {}
             SMA: {}
     Seasonality: 0
            Beta: [1×0]
        Variance: 0.100222

EstMdl is a fully specified, estimated arima model object; its estimates resemble the parameter values of the AR(2) model DGP. The value of c in the estimation summary and object display is 0, and corresponding inferences are trivial or do not apply.

Load the US equity index data set Data_EquityIdx.

load Data_EquityIdx

The table DataTable includes the time series variable NYSE, which contains daily NYSE composite closing prices from January 1990 through December 2001.

Convert the table to a timetable.

dt = datetime(dates,'ConvertFrom','datenum','Format','yyyy-MM-dd');
TT = table2timetable(DataTable,'RowTimes',dt);

Suppose that an ARIMA(1,1,1) model is appropriate to model NYSE composite series during the sample period

Fit an ARIMA(1,1,1) model to the data, and return the estimated parameter covariance matrix.

Mdl = arima(1,1,1);
[EstMdl,EstParamCov] = estimate(Mdl,TT{:,"NYSE"});
 
    ARIMA(1,1,1) Model (Gaussian Distribution):
 
                 Value      StandardError    TStatistic     PValue 
                ________    _____________    __________    ________

    Constant     0.15745       0.09783         1.6094       0.10752
    AR{1}       -0.21995       0.15642        -1.4062       0.15968
    MA{1}        0.28539       0.15382         1.8554      0.063544
    Variance      17.159       0.20038         85.632             0
EstParamCov
EstParamCov = 4×4

    0.0096   -0.0002    0.0002    0.0023
   -0.0002    0.0245   -0.0240   -0.0060
    0.0002   -0.0240    0.0237    0.0057
    0.0023   -0.0060    0.0057    0.0402

EstMdl is a fully specified, estimated arima model object. Rows and columns of EstParamCov correspond to the rows in the table of estimates and inferences; for example, Covˆ(ϕˆ1,θˆ1)=-0.024.

Compute estimated parameter standard errors by taking the square root of the diagonal elements of the covariance matrix.

estParamSE = sqrt(diag(EstParamCov))
estParamSE = 4×1

    0.0978
    0.1564
    0.1538
    0.2004

Compute a Wald-based 95% confidence interval on ϕ.

T = size(TT,1); % Effective sample size
phihat = EstMdl.AR{1};
sephihat = estParamSE(2);
ciphi = phihat + tinv([0.025 0.975],T - 3)*sephihat
ciphi = 1×2

   -0.5266    0.0867

The interval contains 0, which suggests that ϕ is insignificant.

Since R2023b

Fit an ARIMA(1,1,1) model to the weekly average NYSE closing prices. Supply a timetable of data and specify the series for the fit.

Load Data

Load the US equity index data set Data_EquityIdx.

load Data_EquityIdx
T = height(DataTimeTable)
T = 
3028

The timetable DataTimeTable includes the time series variable NYSE, which contains daily NYSE composite closing prices from January 1990 through December 2001.

Plot the daily NYSE price series.

figure
plot(DataTimeTable.Time,DataTimeTable.NYSE)
title("NYSE Daily Closing Prices: 1990 - 2001")

Figure contains an axes object. The axes object with title NYSE Daily Closing Prices: 1990 - 2001 contains an object of type line.

Prepare Timetable for Estimation

When you plan to supply a timetable, you must ensure it has all the following characteristics:

  • The selected response variable is numeric and does not contain any missing values.

  • The timestamps in the Time variable are regular, and they are ascending or descending.

Create a new timetable, DTT, by removing all missing values from the timetable, relative to the NYSE price series.

DTT = rmmissing(DataTimeTable,DataVariables="NYSE");
T_DTT = height(DTT)
T_DTT = 
3028

Because all sample times have observed NYSE prices, rmmissing does not remove any observations.

Determine whether the sampling timestamps have a regular frequency and are sorted.

areTimestampsRegular = isregular(DTT,"days")
areTimestampsRegular = logical
   0

areTimestampsSorted = issorted(DTT.Time)
areTimestampsSorted = logical
   1

areTimestampsRegular = 0 indicates that the timestamps of DTT are irregular, and areTimestampsSorted = 1 indicates that the timestamps are sorted. These measurements are irregular because observations occur only on business days.

Remedy the time irregularity by computing the weekly average closing price series of all timetable variables.

DTTW = convert2weekly(DTT,Aggregation="mean");
areTimestampsRegular = isregular(DTTW,"weeks")
areTimestampsRegular = logical
   1

T_DTTW = height(DTTW)
T_DTTW = 
627

The timetable DTTW is regular.

figure
plot(DTTW.Time,DTTW.NYSE)
title("NYSE Daily Closing Prices: 1990 - 2001")

Figure contains an axes object. The axes object with title NYSE Daily Closing Prices: 1990 - 2001 contains an object of type line.

Create Model Template for Estimation

Create an ARIMA(1,1,1) model template for estimation.

Mdl = arima(1,1,1)
Mdl = 
  arima with properties:

     Description: "ARIMA(1,1,1) Model (Gaussian Distribution)"
      SeriesName: "Y"
    Distribution: Name = "Gaussian"
               P: 2
               D: 1
               Q: 1
        Constant: NaN
              AR: {NaN} at lag [1]
             SAR: {}
              MA: {NaN} at lag [1]
             SMA: {}
     Seasonality: 0
            Beta: [1×0]
        Variance: NaN

Mdl is a partially specified arima model object.

Fit Model to Data

Fit an ARIMA(1,1,1) model to weekly average NYSE closing prices. Specify the entire series and the response variable name.

EstMdl = estimate(Mdl,DTTW,ResponseVariable="NYSE");
 
    ARIMA(1,1,1) Model (Gaussian Distribution):
 
                 Value      StandardError    TStatistic      PValue   
                ________    _____________    __________    ___________

    Constant     0.86386       0.46496         1.8579          0.06318
    AR{1}       -0.37582       0.22719        -1.6542          0.09809
    MA{1}        0.47221       0.21741          2.172         0.029858
    Variance       55.89         1.832         30.507      2.1199e-204

EstMdl is a fully specified, estimated arima model object. By default, estimate backcasts for the required Mdl.P = 2 presample responses.

Since R2023b

Because an ARIMA model is a function of previous values, estimate requires presample data to initialize the model early in the sampling period. Although estimate backcasts for presample data by default, you can specify required presample data instead. The P property of an arima model object specifies the required number of presample observations.

Fit an ARIMA(1,1,1) model to the weekly average NYSE closing prices. Supply timetables of presample and estimation data sets.

Load Data

Load the US equity index data set Data_EquityIdx.

load Data_EquityIdx

Prepare Timetable for Estimation

The daily price series are irregular because observations occur only on business days. Remedy the time irregularity by computing the weekly average closing price series of all timetable variables.

DTTW = convert2weekly(DataTimeTable,Aggregation="mean");

Create Model Template for Estimation

Create an ARIMA(1,1,1) model template for estimation.

Mdl = arima(1,1,1)
Mdl = 
  arima with properties:

     Description: "ARIMA(1,1,1) Model (Gaussian Distribution)"
      SeriesName: "Y"
    Distribution: Name = "Gaussian"
               P: 2
               D: 1
               Q: 1
        Constant: NaN
              AR: {NaN} at lag [1]
             SAR: {}
              MA: {NaN} at lag [1]
             SMA: {}
     Seasonality: 0
            Beta: [1×0]
        Variance: NaN

Mdl.P is 2. Therefore, estimate requires 2 presample observations to initialize the model for estimation.

Partition Sample

Partition the entire sample DTTW into presample and estimation sample timetables. The presample occurs first and contains two observations, and the estimation sample contains the remaining observations in DTTW.

PS = DTTW(1:Mdl.P,:);
ES = DTTW((Mdl.P+1):end,:);

Estimate Model

Fit an ARIMA(1,1,1) model to the estimation sample. Specify the presample sample and response variable names.

EstMdl = estimate(Mdl,ES,ResponseVariable="NYSE", ...
    Presample=PS,PresampleResponseVariable="NYSE");
 
    ARIMA(1,1,1) Model (Gaussian Distribution):
 
                 Value      StandardError    TStatistic      PValue   
                ________    _____________    __________    ___________

    Constant     0.83624         0.453          1.846         0.064891
    AR{1}       -0.32862       0.23526        -1.3968          0.16246
    MA{1}        0.42703       0.22613         1.8885         0.058965
    Variance      56.065        1.8433         30.416      3.3809e-203

Since R2023b

Fit an ARIMA(1,1,1) model to the weekly average NYSE closing prices. Specify initial parameter values obtained from an analysis of a pilot sample.

Load Data

Load the US equity index data set Data_EquityIdx.

load Data_EquityIdx

Prepare Timetable for Estimation

The daily price series are irregular because observations occur only on business days. Remedy the time irregularity by computing the weekly average closing price series of all timetable variables.

DTTW = convert2weekly(DataTimeTable,Aggregation="mean");

Create Model Template for Estimation

Create an ARIMA(1,1,1) model template for estimation. Specify the response series name as NYSE.

Mdl = arima(ARLags=1,D=1,MALags=1,SeriesName="NYSE");

Fit Model to Pilot Sample

Treat the first two years as a pilot sample for obtaining initial parameter values when fitting the model to the remaining three years of data. Fit the model to the pilot sample. By default, estimate uses the response data in the table variable that matches Mdl.SeriesName.

endPilot = datetime(1991,12,31);
DTTW0 = DTTW(DTTW.Time <= endPilot,:);

EstMdl0 = estimate(Mdl,DTTW0,Display="off");

EstMdl0 is a fully specified, estimated arima model object.

Estimate Model

Fit an ARIMA(1,1,1) model to the estimation sample. Specify the estimated parameters from the pilot sample fit as initial values for optimization.

DTTWEst = DTTW(DTTW.Time > endPilot,:);

c0 = EstMdl0.Constant;
ar0 = EstMdl0.AR;
ma0 = EstMdl0.MA;
var0 = EstMdl0.Variance;

EstMdl = estimate(Mdl,DTTWEst,Constant0=c0,AR0=ar0, ...
   MA0=ma0,Variance0=var0);
 
    ARIMA(1,1,1) Model (Gaussian Distribution):
 
                 Value      StandardError    TStatistic      PValue   
                ________    _____________    __________    ___________

    Constant     0.93922       0.55503         1.6922         0.090609
    AR{1}       -0.38996       0.26259        -1.4851          0.13753
    MA{1}        0.48477       0.25108         1.9308         0.053513
    Variance      64.661        2.4853         26.018      3.1308e-149

Fit an ARIMAX model to simulated time series data.

Simulate Predictor and Response Data

Create the ARIMAX(2,1,0) model for the DGP, represented by yt in the equation

(1-0.5L+0.3L2)(1-L)1yt=2+1.5x1,t+2.6x2,t-0.3x3,t+εt,

where εt is a series of iid Gaussian random variables with mean 0 and variance 0.1.

DGP = arima(AR={0.5,-0.3},D=1,Constant=2, ...
    Variance=0.1,Beta=[1.5 2.6 -0.3]);

Assume that the exogenous variables x1,t, x2,t, and x3,t are represented by the AR(1) processes

x1,t=0.1x1,t-1+η1,tx2,t=0.2x2,t-1+η2,tx3,t=0.3x3,t-1+η3,t,

where ηi,t follows a Gaussian distribution with mean 0 and variance 0.01 for i{1,2,3}. Create ARIMA models that represent the exogenous variables.

MdlX1 = arima(AR=0.1,Constant=0,Variance=0.01);
MdlX2 = arima(AR=0.2,Constant=0,Variance=0.01);
MdlX3 = arima(AR=0.3,Constant=0,Variance=0.01);

Simulate length 1000 exogenous series from the AR models. Store the simulated data in a matrix.

T = 1000;
rng(10,"twister"); % For reproducibility
x1 = simulate(MdlX1,T);
x2 = simulate(MdlX2,T);
x3 = simulate(MdlX3,T);
X = [x1 x2 x3];

X is a 1000-by-3 matrix of simulated time series data. Each row corresponds to an observation in the time series, and each column corresponds to an exogenous variable.

Simulate a length 1000 series from the DGP. Specify the simulated exogenous data.

y = simulate(DGP,T,X=X);

y is a 1000-by-1 vector of response data.

Estimate Model

Create an ARIMA(2,1,0) model template for estimation.

Mdl = arima(2,1,0)
Mdl = 
  arima with properties:

     Description: "ARIMA(2,1,0) Model (Gaussian Distribution)"
      SeriesName: "Y"
    Distribution: Name = "Gaussian"
               P: 3
               D: 1
               Q: 0
        Constant: NaN
              AR: {NaN NaN} at lags [1 2]
             SAR: {}
              MA: {}
             SMA: {}
     Seasonality: 0
            Beta: [1×0]
        Variance: NaN

The model description (Description property) and value of Beta suggest that the partially specified arima model object Mdl is agnostic of the exogenous predictors.

Estimate the ARIMAX(2,1,0) model; specify the exogenous predictor data. Because estimate backcasts for presample responses (a process that requires presample predictor data for ARIMAX models), fit the model to the latest T – Mdl.P responses. (Alternatively, you can specify presample responses by using the Y0 name-value argument.)

EstMdl = estimate(Mdl,y((Mdl.P + 1):T),X=X);
 
    ARIMAX(2,1,0) Model (Gaussian Distribution):
 
                 Value      StandardError    TStatistic      PValue   
                ________    _____________    __________    ___________

    Constant      1.7519       0.021143        82.859                0
    AR{1}        0.56076       0.016511        33.963      7.9428e-253
    AR{2}       -0.26625       0.015966       -16.676       1.9633e-62
    Beta(1)       1.4764        0.10157        14.536       7.1229e-48
    Beta(2)       2.5638        0.10445        24.547      4.6635e-133
    Beta(3)     -0.34422       0.098623       -3.4903       0.00048249
    Variance     0.10673      0.0047273        22.577      7.3157e-113

EstMdl is a fully specified, estimated arima model object.

When you estimate the model by using estimate and supply the exogenous data by specifying the X name-value argument, MATLAB® recognizes the model as an ARIMAX(2,1,0) model and includes a linear regression component for the exogenous variables.

The estimated model is

(1-0.56L+0.27L2)(1-L)1yt=1.75+1.48x1,t+2.56x2,t-0.34x3,t+εt,

which resembles the DGP represented by Mdl0. Because MATLAB returns the AR coefficients of the model expressed in difference-equation notation, their signs are opposite in the equation.

Since R2023b

Fit an ARIMA(1,1,1) model to the weekly average NYSE closing prices. Compute estimated weekly averages closing price within the time range of the data.

Load the US equity index data set Data_EquityIdx.

load Data_EquityIdx

The daily price series are irregular because observations occur only on business days. Remedy the time irregularity by computing the weekly average closing price series of all timetable variables.

DTTW = convert2weekly(DataTimeTable,Aggregation="mean");
numobs = height(DTTW)
numobs = 
627

Create an ARIMA(1,1,1) model template for estimation. Specify the response series name as NYSE.

Mdl = arima(1,1,1);
Mdl.SeriesName = "NYSE";

Fit an ARIMA(1,1,1) model to the entire sample. Suppress the estimation display.

EstMdl = estimate(Mdl,DTTW,Display="off");

Infer residuals et from the estimated model.

ResidTT = infer(EstMdl,DTTW);
tail(ResidTT)
       Time         NYSE     NASDAQ    NYSE_Residual    NYSE_Variance
    ___________    ______    ______    _____________    _____________

    16-Nov-2001    577.11    1886.9        5.8562           55.89    
    23-Nov-2001       583    1898.3        5.4409           55.89    
    30-Nov-2001    581.41    1925.8       -2.8105           55.89    
    07-Dec-2001    584.96    1998.1        3.4212           55.89    
    14-Dec-2001    574.03      1981       -12.071           55.89    
    21-Dec-2001     582.1    1967.9        8.7933           55.89    
    28-Dec-2001    590.28    1967.2        6.2015           55.89    
    04-Jan-2002     589.8    1950.4       -1.2004           55.89    

ResidTT is a 627-by-4 timetable containing the data passed to esimtate from DTTW, and the residuals NYSE_Residual and estimated conditional variances NYSE_Variance from the fit. Because the model variance is a constant, the conditional variance variable contains a vector completely composed of 55.89, which is the model variance estimate.

Compute the fitted values ytˆ and store them in ResidTT.

ResidTT.NYSE_YHat = ResidTT.NYSE - ResidTT.NYSE_Residual;
tail(ResidTT)
       Time         NYSE     NASDAQ    NYSE_Residual    NYSE_Variance    NYSE_YHat
    ___________    ______    ______    _____________    _____________    _________

    16-Nov-2001    577.11    1886.9        5.8562           55.89         571.25  
    23-Nov-2001       583    1898.3        5.4409           55.89         577.56  
    30-Nov-2001    581.41    1925.8       -2.8105           55.89         584.22  
    07-Dec-2001    584.96    1998.1        3.4212           55.89         581.54  
    14-Dec-2001    574.03      1981       -12.071           55.89          586.1  
    21-Dec-2001     582.1    1967.9        8.7933           55.89          573.3  
    28-Dec-2001    590.28    1967.2        6.2015           55.89         584.08  
    04-Jan-2002     589.8    1950.4       -1.2004           55.89            591  

Plot the last 200 observations with corresponding fitted values on the same graph.

figure
h = plot(ResidTT.Time((end-199):end),ResidTT{(end-199):end,["NYSE" "NYSE_YHat"]});
h(2).LineStyle = "--";
legend(["Observations" "Fitted values"])
title("Model of NYSE Weekly Average Closing Prices")

Figure contains an axes object. The axes object with title Model of NYSE Weekly Average Closing Prices contains 2 objects of type line. These objects represent Observations, Fitted values.

The fitted values closely track the observations.

Plot the residuals versus the fitted values.

figure
plot(ResidTT.NYSE_YHat,ResidTT.NYSE_Residual,".",MarkerSize=15)
ylabel("Residuals")
xlabel("Fitted Values")
title("Residual Plot")

Figure contains an axes object. The axes object with title Residual Plot, xlabel Fitted Values, ylabel Residuals contains a line object which displays its values using only markers.

The residual variance appears larger for larger fitted values. One remedy for this behavior is to apply the log transform to the data.

Input Arguments

collapse all

Partially specified ARIMA model used to indicate constrained and estimable model parameters, specified as an arima model object returned by arima. Properties of Mdl describe the model structure and can specify parameter values.

estimate fits unspecified (NaN-valued) parameters to the data y.

estimate treats specified parameters as equality constraints during estimation.

Single path of observed response data yt, to which the model Mdl is fit, specified as a numobs-by-1 numeric column vector. The last observation of y is the latest observation.

y is the continuation of the presample series Y0.

Data Types: double

Since R2023b

Time series data, to which estimate fits the model, specified as a table or timetable with numvars variables and numobs rows.

The selected response variable is a numeric vector representing a single path of numobs observations. You can optionally select a response variable yt from Tbl1 by using the ResponseVariables name-value argument, and you can select numpreds predictor variables xt for the exogenous regression component by using the PredictorVariables name-value argument.

Each row is an observation, and measurements in each row occur simultaneously. Variables in Tbl1 represent the continuation of corresponding variables in Presample.

If Tbl1 is a timetable, it must represent a sample with a regular datetime time step (see isregular), and the datetime vector Tbl1.Time must be strictly ascending or descending.

If Tbl1 is a table, the last row contains the latest observation.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: esimtate(Mdl,y,Y0=y0,X=Pred) uses the vector y0 as presample responses for estimation and includes a linear regression component for the exogenous predictor data in the vector Pred.

Estimation Options

collapse all

Since R2023b

Response variable yt to select from Tbl1 containing the response data, specified as one of the following data types:

  • String scalar or character vector containing a variable name in Tbl1.Properties.VariableNames

  • Variable index (integer) to select from Tbl1.Properties.VariableNames

  • A length numvars logical vector, where ResponseVariable(j) = true selects variable j from Tbl1.Properties.VariableNames, and sum(ResponseVariable) is 1

The selected variable must be a numeric vector and cannot contain missing values (NaN).

If Tbl1 has one variable, the default specifies that variable. Otherwise, the default matches the variable to name in Mdl.SeriesName.

Example: ResponseVariable="StockRate2"

Example: ResponseVariable=[false false true false] or ResponseVariable=3 selects the third table variable as the response variable.

Data Types: double | logical | char | cell | string

Exogenous predictor data for the linear regression component, specified as a numeric matrix containing numpreds columns. Use X only when you supply a vector of response data y.

numpreds is the number of predictor variables.

Rows correspond to observations, and the last row contains the latest observation. estimate does not use the regression component in the presample period. X must have at least as many observations as are used after the presample period:

  • If you specify Y0, X must have at least numobs rows.

  • Otherwise, X must have at least numobs + Mdl.P observations to account for the presample removal.

In either case, if you supply more rows than necessary, estimate uses the latest observations only.

estimate synchronizes X and y so that the latest observations (last rows) occur simultaneously.

Columns correspond to individual predictor variables.

By default, estimate excludes the regression component, regardless of its presence in Mdl.

Data Types: double

Since R2023b

Exogenous predictor variables xt to select from Tbl1 containing predictor data for the regression component, specified as one of the following data types:

  • String vector or cell vector of character vectors containing numpreds variable names in Tbl1.Properties.VariableNames

  • A length numpreds vector of unique indices (positive integers) of variables to select from Tbl1.Properties.VariableNames

  • A length numvars logical vector, where PredictorVariables(j) = true selects variable j from Tbl1.Properties.VariableNames, and sum(PredictorVariables) is numpreds

The selected variables must be numeric vectors and cannot contain missing values (NaN).

If you specify PredictorVariables, you must also specify presample response data to by using the Presample and PresampleResponseVariable name-value arguments. For more details, see Algorithms.

By default, estimate excludes the regression component, regardless of its presence in Mdl.

Example: PredictorVariables=["M1SL" "TB3MS" "UNRATE"]

Example: PredictorVariables=[true false true false] or PredictorVariable=[1 3] selects the first and third table variables to supply the predictor data.

Data Types: double | logical | char | cell | string

Optimization options, specified as an optimoptions optimization controller. For details on modifying the default values of the optimizer, see optimoptions or fmincon in Optimization Toolbox™.

For example, to change the constraint tolerance to 1e-6, set options = optimoptions(@fmincon,ConstraintTolerance=1e-6,Algorithm="sqp"). Then, pass Options into estimate using Options=options.

By default, estimate uses the same default options as fmincon, except Algorithm is "sqp" and ConstraintTolerance is 1e-7.

Command Window display option, specified as one or more of the values in this table.

ValueInformation Displayed
"diagnostics"Optimization diagnostics
"full"Maximum likelihood parameter estimates, standard errors, t statistics, iterative optimization information, and optimization diagnostics
"iter"Iterative optimization information
"off"None
"params"Maximum likelihood parameter estimates, standard errors, and t statistics and p-values of coefficient significance tests

Example: Display="off" is well suited for running a simulation that estimates many models.

Example: Display=["params" "diagnostics"] displays all estimation results and the optimization diagnostics.

Data Types: char | cell | string

Presample Specifications

collapse all

Presample response data yt to initialize the model, specified as a numpreobs-by-1 numeric column vector. Use Y0 only when you supply the vector of response data y.

numpreobs is the number of presample observations. Each row is a presample observation. The last row contains the latest presample observation. numpreobs must be at least Mdl.P. If numpreobs > Mdl.P, estimate uses the latest required number of observations only. The last element or row contains the latest observation.

By default, estimate backward forecasts (backcasts) for the necessary amount of presample responses.

For details on partitioning data for estimation, see Time Base Partitions for ARIMA Model Estimation.

Data Types: double

Presample residual data et to initialize the model, specified as a numpreobs-by-1 numeric column vector. Use E0 only when you supply the vector of response data y.

numpreobs is the number of presample observations. Each row is a presample observation. The last row contains the latest presample observation. numpreobs must be at least Mdl.Q. If numpreobs > Mdl.Q, estimate uses the latest required number of observations only. The last element or row contains the latest observation.

If Mdl.Variance is a conditional variance model object, such as a garch model, estimate can require more than Mdl.Q presample innovations.

By default, estimate sets all required presample residuals to 0, which is the expected value of the corresponding innovations series.

Data Types: double

Presample conditional variances σ2t to initialize any conditional variance model, numpreobs-by-1 positive column vector. If Mdl.Variance is a conditional variance model, V0 provides initial values for that model. Use V0 only when you supply the vector of response data y.

Each row is a presample observation. numpreobs must be at least number of observations required to initialize the conditional variance model type in Mdl.Variance (see estimate). If V0 has extra rows, estimate uses only the latest observations. The last row contains the latest presample observation.

If the variance is constant, estimate ignores V0.

By default, estimate sets the necessary presample conditional variances to the average squared value of the inferred residuals.

Data Types: double

Since R2023b

Presample data containing the response yt, residual et, or conditional variance σt2 series to initialize the model for estimation, specified as a table or timetable, the same type as Tbl1, with numprevars variables and numpreobs rows. Use Presample only when you supply a table or timetable of data Tbl1.

Each selected variable is a single path of numpreobs observations representing the presample of responses, residuals, or conditional variances for the selected response variable in Tbl1.

Each row is a presample observation, and measurements in each row occur simultaneously. numpreobs must satisfy one of the following conditions:

  • numpreobsMdl.P when Presample provides only presample responses

  • numpreobsMdl.Q when Presample provides only presample residuals

  • numpreobsmax([Mdl.P Mdl.Q]) when Presample provides presample responses and residuals.

  • Mdl can require more presample observations then specified in the other conditions when Presample provides presample conditional variances. For more details, see estimate.

If you supply more rows than necessary, estimate uses the latest required number of observations only.

When

If Presample is a timetable, all the following conditions must be true:

  • Presample must represent a sample with a regular datetime time step (see isregular).

  • The inputs Tbl1 and Presample must be consistent in time such that Presample immediately precedes Tbl1 with respect to the sampling frequency and order.

  • The datetime vector of sample timestamps Presample.Time must be ascending or descending.

If Presample is a table, the last row contains the latest presample observation.

By default:

  • When Mdl is an ARIMA model without an exogenous linear regression component, estimate backcasts for necessary presample responses, sets necessary presample residuals to 0, and sets necessary presample variances to the average squared value of inferred residuals.

  • When Mdl is an ARIMAX model (you specify the PredictorVariables name-value argument), you must specify presample response data because estimate cannot backcast for presample responses. estimate sets necessary presample residuals to 0 and necessary presample variances to the average squared value of inferred residuals.

If you specify the Presample, you must specify the presample response, innovation, and conditional variance variable names by using the PresampleResponseVariable, PresampleInnovationVariable, or PresampleVarianceVariable name-value argument, respectively.

Since R2023b

Response variable yt to select from Presample containing presample response data, specified as one of the following data types:

  • String scalar or character vector containing the variable name to select from Presample.Properties.VariableNames

  • Variable index (positive integer) to select from Presample.Properties.VariableNames

  • A logical vector, where PresampleResponseVariable(j) = true selects variable j from Presample.Properties.VariableNames

The selected variable must be a numeric vector and cannot contain missing values (NaNs).

If you specify presample response data by using the Presample name-value argument, you must specify PresampleResponseVariable.

Example: PresampleResponseVariable="GDP"

Example: PresampleResponseVariable=[false false true false] or PresampleResponseVariable=3 selects the third table variable for presample response data.

Data Types: double | logical | char | cell | string

Since R2023b

Residual variable et to select from Presample containing presample residual data, specified as one of the following data types:

  • String scalar or character vector containing the variable name to select from Presample.Properties.VariableNames

  • Variable index (positive integer) to select from Presample.Properties.VariableNames

  • A logical vector, where PresampleInnovationVariable(j) = true selects variable j from Presample.Properties.VariableNames

The selected variable must be a numeric vector and cannot contain missing values (NaNs).

If you specify presample residual data by using the Presample name-value argument, you must specify PresampleInnovationVariable.

Example: PresampleInnovationVariable="GDPInnov"

Example: PresampleInnovationVariable=[false false true false] or PresampleInnovationVariable=3 selects the third table variable for presample residual data.

Data Types: double | logical | char | cell | string

Since R2023b

Conditional variance variable σt2 to select from of Presample containing presample conditional variance data, specified as one of the following data types:

  • String scalar or character vector containing a variable name in Presample.Properties.VariableNames

  • Variable index (positive integer) to select from Presample.Properties.VariableNames

  • A logical vector, where PresampleVarianceVariable(j) = true selects variable j from Presample.Properties.VariableNames

The selected variable must be a numeric vector and cannot contain missing values (NaNs).

If you specify presample conditional variance data by using the Presample name-value argument, you must specify PresampleVarianceVariable.

Example: PresampleVarianceVariable="StockRateVar0"

Example: PresampleVarianceVariable=[false false true false] or PresampleVarianceVariable=3 selects the third table variable as the presample conditional variance variable.

Data Types: double | logical | char | cell | string

Initial Parameter Value Specifications

collapse all

Initial estimate of the model constant c, specified as a numeric scalar.

By default, estimate derives initial estimates using standard time series techniques.

Data Types: double

Initial estimates of the nonseasonal AR polynomial coefficients ϕ(L), specified as a numeric vector.

Elements of AR0 correspond to nonzero cells of Mdl.AR.

By default, estimate derives initial estimates using standard time series techniques.

Data Types: double

Initial estimates of the seasonal autoregressive polynomial coefficients Φ(L), specified as a numeric vector.

Elements of SAR0 correspond to nonzero cells of Mdl.SAR.

By default, estimate derives initial estimates using standard time series techniques.

Data Types: double

Initial estimates of the nonseasonal moving average polynomial coefficients θ(L), specified as a numeric vector.

Elements of MA0 correspond to elements of Mdl.MA.

By default, estimate derives initial estimates using standard time series techniques.

Data Types: double

Initial estimates of the seasonal moving average polynomial coefficients Θ(L), specified as a numeric vector.

Elements of SMA0 correspond to nonzero cells of Mdl.SMA.

By default, estimate derives initial estimates using standard time series techniques.

Data Types: double

Initial estimates of the regression coefficients β, specified as a numeric vector.

The length of Beta0 must equal the numpreds. Elements of Beta0 correspond to the predictor variables represented by the columns of X or PredictorVariables.

By default, estimate derives initial estimates using standard time series techniques.

Data Types: double

Initial estimate of the t-distribution degrees-of-freedom parameter ν, specified as a positive scalar. DoF0 must exceed 2.

Data Types: double

Initial estimates of variances of innovations, specified as a positive scalar or a cell vector of name-value arguments.

Mdl.Variance ValueDescription'Variance0' Value
Numeric scalar or NaNConstant variancePositive scalar
garch, egarch, or gjr model objectConditional variance modelCell vector of name-value arguments for specifying initial estimates, see the estimate function of the conditional variance model objects. The cell vector must have the form {'Name1',value1,'Name2',value2,...}.

By default, estimate derives initial estimates using standard time series techniques.

Example: For a model with a constant variance, set Variance0=2 to specify an initial variance estimate of 2.

Example: For a composite conditional mean and variance model, set Variance0={'Constant0',2,'ARCH0',0.1} to specify an initial estimate of 2 for the conditional variance model constant, and an initial estimate of 0.1 for the lag 1 coefficient in the ARCH polynomial.

Data Types: double | cell

Note

  • NaN values in y, X, Y0, E0, and V0 indicate missing values. estimate removes missing values from specified data by listwise deletion.

    • For the presample, estimate horizontally concatenates Y0, E0, and V0, and then it removes any row of the concatenated matrix containing at least one NaN.

    • For the estimation sample, estimate horizontally concatenates y and X, and then it removes any row of the concatenated matrix containing at least one NaN.

    • Regardless of sample, estimate synchronizes the specified, possibly jagged vectors with respect to the latest observation of the sample (last row).

    This type of data reduction reduces the effective sample size and can create an irregular time series.

  • estimate issues an error when any table or timetable input contains missing values.

Output Arguments

collapse all

Estimated ARIMA model, returned as an arima model object.

EstMdl is a copy of Mdl that has NaN values replaced with parameter estimates. EstMdl is fully specified.

Estimated covariance matrix of maximum likelihood estimates known to the optimizer, returned as a positive semidefinite numeric matrix.

The rows and columns contain the covariances of the parameter estimates. The standard error of each parameter estimate is the square root of the main diagonal entries.

The rows and columns corresponding to any parameters held fixed as equality constraints are zero vectors.

Parameters corresponding to the rows and columns of EstParamCov appear in the following order:

  • Constant

  • Nonzero AR coefficients at positive lags, from the smallest to largest lag

  • Nonzero SAR coefficients at positive lags, from the smallest to largest lag

  • Nonzero MA coefficients at positive lags, from the smallest to largest lag

  • Nonzero SMA coefficients at positive lags, from the smallest to largest lag

  • Regression coefficients (when you specify exogenous data), ordered by the columns of X or entries of PredictorVariables

  • Variance parameters, a scalar for constant variance models and vector for conditional variance models (see estimate for the order of parameters)

  • Degrees of freedom (t-innovation distribution only)

Data Types: double

Optimized loglikelihood objective function value, returned as a numeric scalar.

Data Types: double

Optimization summary, returned as a structure array with the fields described in this table.

FieldDescription
exitflagOptimization exit flag (see fmincon in Optimization Toolbox)
optionsOptimization options controller (see optimoptions and fmincon in Optimization Toolbox)
XVector of final parameter estimates
X0Vector of initial parameter estimates

For example, you can display the vector of final estimates by entering info.X in the Command Window.

Data Types: struct

Tips

  • To access values of the estimation results, including the number of free parameters in the model, pass EstMdl to summarize.

Algorithms

  • estimate infers innovations and conditional variances (when present) of the underlying response series, and then uses constrained maximum likelihood to fit the model Mdl to the response data y.

  • Because you can specify numeric presample data inputs Y0, E0, and V0 of differing lengths, estimate assumes that all specified sets have these characteristics:

    • The final observation (row) in each set occurs simultaneously.

    • The first observation in the estimation sample immediately follows the last observation in the presample, with respect to the sampling frequency.

  • If you specify the Display name-value argument, the value overrides the Diagnostics and Display settings of the Options name-value argument. Otherwise, estimate displays optimization information using Options settings.

  • estimate uses the outer product of gradients (OPG) method to perform covariance matrix estimation.

  • If you supply data in the table or timetable Tbl1 to estimate an ARIMAX model, estimate cannot backcast for presample responses. Therefore, if you specify PredictorVariables, you must also specify presample response data by using the Presample and PresampleResponseVariable name-value arguments.

References

[1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.

[2] Enders, Walter. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, Inc., 1995.

[3] Greene, William. H. Econometric Analysis. 6th ed. Upper Saddle River, NJ: Prentice Hall, 2008.

[4] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.

Version History

Introduced in R2012a

expand all