Main Content

arima

Convert regression model with ARIMA errors to ARIMAX model

Description

The arima object function converts a specified regression model with ARIMA errors (regARIMA model object) to the equivalent ARIMAX model (arima model object). To create an ARIMAX model directly, see the arima function.

example

ARIMAXMdl = arima(Mdl) returns ARIMAXMdl, the fully specified ARIMAX model representation of the fully specified regression model with ARIMA time series errors Mdl.

example

[ARIMAXMdl,XNew] = arima(Mdl,X=X) returns the matrix of predictor data XNew for the output ARIMAX model, transformed from the specified matrix of predictor data X associated with the input regression model with ARIMA errors.

example

[ARIMAXMdl,Tbl2] = arima(Mdl,PredictorTbl=Tbl1) returns the table or timetable of predictor data Tbl2 for the output ARIMAX model, transformed from the specified predictor data in the table or timetable Tbl1 associated with the input regression model with ARIMA errors. arima selects all variables in Tbl1 as predictor variables for the regression component of Mdl. (since R2023b)

example

[ARIMAXMdl,Tbl2] = arima(Mdl,PredictorTbl=Tbl1,PredictorVariables=PredictorVariables) selects the variable names in PredictorVariables from Tbl1 for the regression component in Mdl. (since R2023b)

Examples

collapse all

Convert a regression model with ARMA(4,1) errors to an ARIMAX model using the arima converter. Provide predictor data in a numeric array.

Specify the regression model with ARMA(4,1) errors:

yt=1+0.5Xt+utut=0.8ut-1-0.4ut-4+εt+0.3εt-1,

where εt is Gaussian with mean 0 and variance 1.

Mdl = regARIMA(AR={0.8 -0.4},ARLags=[1 4],MA=0.3, ...
    Intercept=1,Beta=0.5,Variance=1)
Mdl = 
  regARIMA with properties:

     Description: "Regression with ARMA(4,1) Error Model (Gaussian Distribution)"
      SeriesName: "Y"
    Distribution: Name = "Gaussian"
       Intercept: 1
            Beta: [0.5]
               P: 4
               Q: 1
              AR: {0.8 -0.4} at lags [1 4]
             SAR: {}
              MA: {0.3} at lag [1]
             SMA: {}
        Variance: 1

You can verify that the lags of the autoregressive terms are 1 and 4 in the AR row.

Generate random predictor data.

rng(1,"twister"); % For reproducibility
T = 20;
Pred = randn(T,1);

Convert Mdl to an ARIMAX model. Supply the random set of predictor data Pred for Mdl and return the predictor data for the converted model.

[ARIMAXMdl,XNew] = arima(Mdl,X=Pred);
ARIMAXMdl
ARIMAXMdl = 
  arima with properties:

     Description: "ARIMAX(4,0,1) Model (Gaussian Distribution)"
      SeriesName: "Y"
    Distribution: Name = "Gaussian"
               P: 4
               D: 0
               Q: 1
        Constant: 0.6
              AR: {0.8 -0.4} at lags [1 4]
             SAR: {}
              MA: {0.3} at lag [1]
             SMA: {}
     Seasonality: 0
            Beta: [1 -0.8 0.4]
        Variance: 1

The output arima model ARIMAXMdl is

yt=0.6+ZtΓ+0.8yt-1-0.4yt-4+εt+0.3εt-1,

where

ZtΓ=[0.5x1NaNNaN0.5x20.5x1NaN0.5x30.5x2NaN0.5x40.5x3NaN0.5x50.5x40.5x10.5T0.5xT-10.5xT-4][1-0.80.4]

and xj is row j of Pred. Because the product of the autoregressive and integration polynomials is ϕ(L)=(1-0.8L+0.4L4), ARIMAX.Beta is [1; -0.8; 0.4]. Note that the software carries over the autoregressive and moving average coefficients from Mdl to ARIMAX. Also, Mdl.Intercept = 1 and ARIMAX.Constant = (1 - 0.8 + 0.4)(1) = 0.6, i.e., the regARIMA model intercept and arima model constant are generally unequal.

Convert a regression model with seasonal ARIMA errors to an ARIMAX model using the arima converter.

Specify the regression model with ARIMA(2,1,1)×(1,1,0)2 errors:

yt=Xt[-21]+ut(1-0.3L+0.15L2)(1-L)(1-0.2L2)(1-L2)ut=(1+0.1L)εt,

where εt is Gaussian with mean 0 and variance 1.

Mdl = regARIMA(AR={0.3, -0.15},MA=0.1,ARLags=[1 2], ...
    SAR=0.2,SARLags=2,Seasonality=2,D=1, ...
    Intercept=0,Beta=[-2; 1],Variance=1)
Mdl = 
  regARIMA with properties:

     Description: "Regression with ARIMA(2,1,1) Error Model Seasonally Integrated with Seasonal AR(2) (Gaussian Distribution)"
      SeriesName: "Y"
    Distribution: Name = "Gaussian"
       Intercept: 0
            Beta: [-2 1]
               P: 7
               D: 1
               Q: 1
              AR: {0.3 -0.15} at lags [1 2]
             SAR: {0.2} at lag [2]
              MA: {0.1} at lag [1]
             SMA: {}
     Seasonality: 2
        Variance: 1

Generate predictor data.

rng(1,"twister"); % For reproducibility
T = 20;
Pred = randn(T,2);

Convert Mdl to an ARIMAX model. Supply the random set of predictor data Pred for Mdl and return the predictor data for the converted model.

[ARIMAX,XNew] = arima(Mdl,X=Pred);
ARIMAX
ARIMAX = 
  arima with properties:

     Description: "ARIMAX(2,1,1) Model Seasonally Integrated with Seasonal AR(2) (Gaussian Distribution)"
      SeriesName: "Y"
    Distribution: Name = "Gaussian"
               P: 7
               D: 1
               Q: 1
        Constant: 0
              AR: {0.3 -0.15} at lags [1 2]
             SAR: {0.2} at lag [2]
              MA: {0.1} at lag [1]
             SMA: {}
     Seasonality: 2
            Beta: [1 -1.3 -0.75 1.41 -0.34 -0.08 0.09 -0.03]
        Variance: 1

Mdl.Beta has length 2, but ARIMAX.Beta has length 8. This is because the product of the autoregressive and integration polynomials, ϕ(L)(1-L)Φ(L)(1-Ls), is

1-1.3L-0.75L2+1.41L3-0.34L4-0.08L5+0.09L6-0.03L7.

You can see that when you add seasonality, seasonal lag terms, and integration to a model, the size of XNew can grow quite large. A conversion such as this might not be ideal for analyses involving small sample sizes.

Fit a regression model with ARMA(1,1) errors by regressing the US consumer price index (CPI) quarterly changes onto the US gross domestic product (GDP) growth rate. Convert the fitted model to an ARIMAX model. Supply a timetable of data and specify the series for the fit.

Load and Transform Data

Load the US macroeconomic data set. Compute the series of GDP quarterly growth rates and CPI quarterly changes.

load Data_USEconModel
DTT = price2ret(DataTimeTable,DataVariables="GDP");
DTT.GDPRate = 100*DTT.GDP;
DTT.CPIDel = diff(DataTimeTable.CPIAUCSL);
T = height(DTT) 
T = 248
figure
tiledlayout(2,1)
nexttile
plot(DTT.Time,DTT.GDPRate)
title("GDP Rate")
ylabel("Percent Growth")
nexttile
plot(DTT.Time,DTT.CPIDel)
title("Index")

The series appear stationary, albeit heteroscedastic.

Prepare Timetable for Estimation

When you plan to supply a timetable, you must ensure it has all the following characteristics:

  • The selected response variable is numeric and does not contain any missing values.

  • The timestamps in the Time variable are regular, and they are ascending or descending.

Remove all missing values from the timetable.

DTT = rmmissing(DTT);
T_DTT = height(DTT)
T_DTT = 248

Because each sample time has an observation for all variables, rmmissing does not remove any observations.

Determine whether the sampling timestamps have a regular frequency and are sorted.

areTimestampsRegular = isregular(DTT,"quarters")
areTimestampsRegular = logical
   0

areTimestampsSorted = issorted(DTT.Time)
areTimestampsSorted = logical
   1

areTimestampsRegular = 0 indicates that the timestamps of DTT are irregular. areTimestampsSorted = 1 indicates that the timestamps are sorted. Macroeconomic series in this example are timestamped at the end of the month. This quality induces an irregularly measured series.

Remedy the time irregularity by shifting all dates to the first day of the quarter.

dt = DTT.Time;
dt = dateshift(dt,"start","quarter");
DTT.Time = dt;
areTimestampsRegular = isregular(DTT,"quarters")
areTimestampsRegular = logical
   1

DTT is regular.

Create Model Template for Estimation

Suppose that a regression model of CPI quarterly changes onto the GDP rate, with ARMA(1,1) errors, is appropriate.

Create a model template for a regression model with ARMA(1,1) errors template.

Mdl = regARIMA(1,0,1)
Mdl = 
  regARIMA with properties:

     Description: "ARMA(1,1) Error Model (Gaussian Distribution)"
      SeriesName: "Y"
    Distribution: Name = "Gaussian"
       Intercept: NaN
            Beta: [1×0]
               P: 1
               Q: 1
              AR: {NaN} at lag [1]
             SAR: {}
              MA: {NaN} at lag [1]
             SMA: {}
        Variance: NaN

Mdl is a partially specified regARIMA object.

Fit Model to Data

Fit a regression model with ARMA(1,1) errors to the data. Specify the entire series GDP rate and CPI quarterly changes series, and specify the response and predictor variable names.

EstMdl = estimate(Mdl,DTT,ResponseVariable="GDPRate", ...
    PredictorVariables="CPIDel");
 
    Regression with ARMA(1,1) Error Model (Gaussian Distribution):
 
                  Value      StandardError    TStatistic      PValue  
                 ________    _____________    __________    __________

    Intercept      0.0162      0.0016077        10.077      6.9994e-24
    AR{1}         0.60515       0.089912        6.7305      1.6906e-11
    MA{1}        -0.16221        0.11051       -1.4678         0.14216
    Beta(1)      0.002221     0.00077691        2.8587       0.0042532
    Variance     0.000113     7.2753e-06        15.533      2.0838e-54

EstMdl is a fully specified, estimated regARIMA object. By default, estimate backcasts for the required Mdl.P = 1 presample regression model residual and sets the required Mdl.Q = 1 presample error model residual to 0.

Convert Fitted Model

Convert the fitted model to an ARIMAX model. Supply DTT and select the predictor variables from it. Return the timetable of predictor data for the converted model.

[ARIMAXMdl,Tbl2] = arima(EstMdl,PredictorTbl=DTT,PredictorVariables="CPIDel");
ARIMAXMdl
ARIMAXMdl = 
  arima with properties:

     Description: "ARIMAX(1,0,1) Model (Gaussian Distribution)"
      SeriesName: "Y"
    Distribution: Name = "Gaussian"
               P: 1
               D: 0
               Q: 1
        Constant: 0.00639649
              AR: {0.605153} at lag [1]
             SAR: {}
              MA: {-0.162208} at lag [1]
             SMA: {}
     Seasonality: 0
            Beta: [1 -0.605153]
        Variance: 0.000113005
tail(Tbl2)
    Time     Interval        GDP         GDPRate      CPIDel    Lag0XBeta    Lag1XBeta
    _____    ________    ___________    __________    ______    _________    _________

    Q2-07       91        0.00018278      0.018278     1.675    0.0037202    0.0045486
    Q3-07       91        0.00016916      0.016916     1.359    0.0030183    0.0037202
    Q4-07       94        6.1286e-05     0.0061286     3.355    0.0074515    0.0030183
    Q1-08       91        9.3272e-05     0.0093272      1.93    0.0042865    0.0074515
    Q2-08       91        0.00011103      0.011103     3.367    0.0074781    0.0042865
    Q3-08       92        8.9585e-05     0.0089585     1.641    0.0036447    0.0074781
    Q4-08       92       -0.00016145     -0.016145    -7.098    -0.015765    0.0036447
    Q1-09       90       -8.6878e-05    -0.0086878     1.137    0.0025253    -0.015765

ARIMAXMdl is an arima object representing the converted model. Tbl2 is a timetable containing the same variables as DTT and predictor variables for the exogenous regression component of ARIMAXMdl, Lag0XBeta and Lag1XBeta.

Input Arguments

collapse all

Fully specified regression model with ARIMA errors, specified as a regARIMA model object created by regARIMA or estimate.

The properties of Mdl cannot contain NaN values.

Predictor data xt for the regression component of the input regression model with ARIMA errors Mdl, specified as a numobs-by-numpredsMdl numeric matrix, where numpredsMdl is numel(Mdl.Beta).

The last row of X contains the latest observation.

Each column of X is a separate predictor variable.

Data Types: double

Since R2023b

Time series data containing predictor variables xt associated with the regression component of the input regression model with ARIMA errors Mdl, specified as a table or timetable with numvars1 variables and numobs rows.

Each selected predictor variable is a numeric vectors representing a single path of numobs observations. You can optionally select numpredsMdl predictor variables from Tbl1 by using the PredictorVariables name-value argument.

Each row is an observation, and measurements in each row occur simultaneously.

If Tbl1 is a timetable, it must represent a sample with a regular datetime time step (see isregular), and the datetime vector Tbl1.Time must be strictly ascending or descending.

If Tbl1 is a table, the last row contains the latest observation.

Since R2023b

Variables to select from Tbl1 to treat as the predictor variables xt in the input regression model with ARIMA errors Mdl, specified as one of the following data types:

  • String vector or cell vector of character vectors containing numpredsMdl variable names in Tbl1.Properties.VariableNames

  • A length numpredsMdl vector of unique indices (positive integers) of variables to select from Tbl1.Properties.VariableNames

  • A length numvars logical vector, where PredictorVariables(j) = true selects variable j from Tbl1.Properties.VariableNames

The selected variables must be numeric vectors and cannot contain missing values (NaN).

Example: PredictorVariables=["M1SL" "TB3MS" "UNRATE"]

Example: PredictorVariables=[true false true false] or PredictorVariable=[1 3] selects the first and third table variables to supply the predictor data.

Data Types: double | logical | char | cell | string

Note

  • NaN values in X indicate missing values. The arima function accommodates NaN values such that observations in XNew corresponding to missing values in X are NaNs.

  • arima issues an error when any table or timetable input contains missing values.

Output Arguments

collapse all

ARIMAX model equivalent of the input regression model with ARIMA time series errors Mdl, returned as a fully specified arima model object.

Converted predictor data matrix for the exogenous regression component of the output ARIMAX model ARIMAXMdl, returned as a numobs-by-numpredsARIMAXMdl numeric matrix. numpredsARIMAXMdl is one plus the number of nonzero autoregressive coefficients in the difference equation of Mdl (see Algorithms). The arima returns XNew only when you supply the numeric matrix input X

The last row of XNew contains the latest observation of each series.

Each column of XNew is a separate predictor variable.

Data Types: double

Since R2023b

Converted predictor series, associated with the exogenous regression component of the output ARIMAX model ARIMAXMdl, returned as a table or timetable, the same data type as Tbl1. arima returns Tbl2 only when you supply the input Tbl1.

Tbl2 contains the following variables:

  • The converted predictor variables, which are in a numobs-by-1 numeric vectors. arima names the converted predictor variables in Tbl2 LagNumXBeta, where Num is the lag to which the predictor variable applies. The first converted predictor variable has name Lag0XBeta and applies to lag 0. The last predictor variable applies to lag Mdl.P. The arima function includes intermediate lags only when they are associated with non-zero autoregressive coefficients (see Algorithms).

  • All variables Tbl1.

Each row is an observation, and measurements in each row occur simultaneously.

If Tbl1 is a timetable, row times of Tbl1 and Tbl2 are equal.

Algorithms

Let X denote the matrix of concatenated predictor data vectors (or design matrix) and β denote the regression component for the regression model with ARIMA errors, Mdl.

  • If you specify X or Tbl1, arima returns converted predictor data in XNew or Tbl2 using a certain format. Suppose that the nonzero autoregressive lag term degrees of Mdl are 0 < a1 < a2 < ...< P, which is the largest lag term degree. The software obtains these lag term degrees by expanding and reducing the product of the seasonal and nonseasonal autoregressive lag polynomials, and the seasonal and nonseasonal integration lag polynomials

    ϕ(L)(1L)DΦ(L)(1Ls).

    • The first converted predictor variable is .

    • The second converted predictor variable is a sequence of a1 NaNs, and then the product Xa1β, where Xa1β=La1Xβ.

    • Converted Predictor variable j is a sequence of aj NaNs, and then the product Xajβ, where Xajβ=LajXβ.

    • The last converted predictor variable is a sequence of ap NaNs, and then the product Xpβ, where Xpβ=LpXβ.

    Suppose that Mdl is a regression model with ARIMA(3,1,0) errors, and ϕ1 = 0.2 and ϕ3 = 0.05. Then the product of the autoregressive and integration lag polynomials is

    (10.2L0.05L3)(1L)=11.2L+0.02L20.05L3+0.05L4.

    This implies that ARIMAXMdl.Beta is [1 -1.2 0.02 -0.05 0.05] and XNew is

    [x1βNaNNaNNaNNaNx2βx1βNaNNaNNaNx3βx2βx1βNaNNaNx4βx3βx2βx1βNaNx5βx4βx3βx2βx1βxTβxT1βxT2βxT3βxT4β],

    where xj is row j of X.

  • If you do not specify X or Tbl1, arima returns converted predictor data in XNew as an empty matrix without rows and a number of columns equal to one plus the number of nonzero autoregressive coefficients in the difference equation of Mdl.

Version History

Introduced in R2013b

expand all