Main Content

forecast

Forecast responses of univariate regression model with ARIMA time series errors

Description

example

[Y,YMSE] = forecast(Mdl,numperiods) returns the numperiods-by-1 numeric vector of consecutive forecasted responses Y and the corresponding numeric vector of forecast mean square errors (MSE) YMSE of the fully specified, univariate regression model with ARIMA time series errors Mdl.

[Y,YMSE,U] = forecast(Mdl,numperiods) also forecasts a numperiods-by-1 numeric vector of unconditional disturbances U.

example

[___] = forecast(___,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. forecast returns the output argument combination for the corresponding input arguments. For example, forecast(Mdl,10,Y0=y0,X0=Pred0,XF=Pred) specifies the presample response path y0, and the presample and forecast sample predictor data Pred0 and Pred, respectively, to forecast a model with a regression component.

example

Tbl = forecast(Mdl,numperiods,Presample=Presample,PresampleRegressionDisturbanceVariable=PresampleRegressionDisturbanceVariable) returns the table or timetable Tbl containing a variable for each of the paths of response, forecast MSE, and unconditional disturbance series resulting from forecasting the regression model with ARIMA errors Mdl over a numperiods forecast horizon. Presample is a table or timetable containing presample unconditional disturbance data in the variable specified by PresampleRegressionDisturbanceVariable. Alternatively, Presample can contain presample error model innovation data in the variable specified by PresampleInnovationVariable or a combination of presample response and predictor data in the variables specified by PresampleResponseVariable and PresamplePredictorVariables. You can specify either alternative instead of PresampleRegressionDisturbanceVariable using name-value syntax; forecast infers presample unconditional disturbance data from either alternative specification. (since R2023b)

example

Tbl = forecast(Mdl,numperiods,InSample=InSample,PredictorVariables=PredictorVariables) specifies the variables PredictorVariables in the in-sample table or timetable of data InSample containing the predictor data for the model regression component. (since R2023b)

example

Tbl = forecast(Mdl,numperiods,Presample=Presample,PresampleRegressionDisturbanceVariable=PresampleRegressionDisturbanceVariable,InSample=InSample,PredictorVariables=PredictorVariables) specifies presample unconditional disturbance data to initialize the error model and in-sample predictor data for the regression component. You can choose different presample data from Presample when it is applicable. (since R2023b)

example

Tbl = forecast(___,Name=Value) uses additional options specified by one or more name-value arguments, using any input argument combination in the previous three syntaxes. (since R2023b)

For example, forecast(Mdl,20,Presample=PSTbl,PresampleResponseVariables="GDP",PresamplePredictorVariables="CPI",InSample=Tbl,PredictorVariables="CPI") returns a timetable containing variables for the forecasted responses, forecast MSE, and forecasted unconditional disturbance paths, forecasted 20 periods into the future. forecast initializes the model by using the presample response and predictor data in the GDP and CPI variables of the timetable PSTbl. forecast applies the predictor data in the PredictorVariables variables of the table or timetable Tbl to the model regression component.

Examples

collapse all

Return a vector of responses, forecasted over a 30-period horizon, from the following regression model with ARMA(2,1) errors:

yt=Xt[0.1-0.2]+utut=0.5ut-1-0.8ut-2+εt-0.5εt-1,

where εt is Gaussian with variance 0.1.

Specify the model. Simulate responses from the model and two predictor series.

Mdl0 = regARIMA(Intercept=0,AR={0.5 -0.8},MA=-0.5, ...
    Beta=[0.1; -0.2],Variance=0.1);
rng(1,"twister");   % For reproducibility
T = 130;
numperiods = 30;
Pred =  randn(T,2);
y = simulate(Mdl0,T,X=Pred);

Fit the model to the first 100 observations, and reserve the remaining 30 observations to evaluate forecast performance.

Mdl = regARIMA(2,0,1);
estidx = 1:(T-numperiods);  % Estimation sample indices
fhidx = (T-numperiods+1):T; % Forecast horizon
EstMdl = estimate(Mdl,y(estidx),X=Pred(estidx,:));
 
    Regression with ARMA(2,1) Error Model (Gaussian Distribution):
 
                   Value      StandardError    TStatistic      PValue  
                 _________    _____________    __________    __________

    Intercept    0.0074068      0.012554        0.58999          0.5552
    AR{1}          0.55422      0.087265          6.351      2.1391e-10
    AR{2}         -0.78361      0.080794        -9.6988      3.0499e-22
    MA{1}         -0.46483        0.1394        -3.3345      0.00085446
    Beta(1)       0.092779      0.024497         3.7873      0.00015228
    Beta(2)       -0.17339      0.021143        -8.2008      2.3874e-16
    Variance      0.073721      0.011006         6.6984      2.1066e-11

EstMdl is a new regARIMA model containing the estimates. The estimates are close to their true values.

Use EstMdl to forecast a 30-period horizon.

[yF,yMSE] = forecast(EstMdl,numperiods,Y0=y(estidx), ...
    X0=Pred(estidx,:),XF=Pred(fhidx,:));

yF is a 30-by-1 vector of forecasted responses and yMSE is a 30-by-1 vector of corresponding forecast MSEs. To initialize the model for forecasting, forecast infers required presample unconditional disturbances from the specified presample response and predictor data.

Visually compare the forecasts to the holdout data using a plot.

figure
plot(y,Color=[.7,.7,.7]);
hold on
plot(fhidx,yF,"b",LineWidth=2);
plot(fhidx,yF + 1.96*sqrt(yMSE),"r:",LineWidth=2);
plot(fhidx,yF - 1.96*sqrt(yMSE),"r:",LineWidth=2);
h = gca;
ph = patch([repmat(T-numperiods+1,1,2) repmat(T,1,2)], ...
    [h.YLim fliplr(h.YLim)],[0 0 0 0],"b");
ph.FaceAlpha = 0.1;
legend("Observed","Forecast","95% forecast interval", ...
    Location="best");
title("30-Period Forecasts and 95% Forecast Intervals")
axis tight
hold off

Figure contains an axes object. The axes object with title 30-Period Forecasts and 95% Forecast Intervals contains 5 objects of type line, patch. These objects represent Observed, Forecast, 95% forecast interval.

Many observations in the holdout sample fall beyond the 95% forecast intervals. Two reasons for this are:

  • The predictors are randomly generated in this example. estimate treats the predictors as fixed. The 95% forecast intervals based on the estimates from estimate do not account for the variability in the predictors.

  • By shear chance, the estimation period seems less volatile than the forecast period. estimate uses the less volatile estimation period data to estimate the parameters. Therefore, forecast intervals based on the estimates should not cover observations that have an underlying innovations process with larger variability.

Forecast stationary, log GDP using a regression model with ARMA(1,1) errors, including CPI as a predictor.

Fit a regression model with ARMA(1,1) errors by regressing the US gross domestic product (GDP) growth rate onto consumer price index (CPI) quarterly changes. Forecast the model into a 2-year (8-quarter) horizon. Supply a timetable of data and specify the series for the fit.

Load and Transform Data

Load the US macroeconomic data set. Compute the series of GDP quarterly growth rates and CPI quarterly changes.

load Data_USEconModel
DTT = price2ret(DataTimeTable,DataVariables="GDP");
DTT.GDPRate = 100*DTT.GDP;
DTT.CPIDel = diff(DataTimeTable.CPIAUCSL);
T = height(DTT) 
T = 248
figure
tiledlayout(2,1)
nexttile
plot(DTT.Time,DTT.GDPRate)
title("GDP Rate")
ylabel("Percent Growth")
nexttile
plot(DTT.Time,DTT.CPIDel)
title("Index")

Figure contains 2 axes objects. Axes object 1 with title GDP Rate, ylabel Percent Growth contains an object of type line. Axes object 2 with title Index contains an object of type line.

The series appear stationary, albeit heteroscedastic.

Prepare Timetable for Estimation

When you plan to supply a timetable, you must ensure it has all the following characteristics:

  • The selected response variable is numeric and does not contain any missing values.

  • The timestamps in the Time variable are regular, and they are ascending or descending.

Remove all missing values from the timetable.

DTT = rmmissing(DTT);
T_DTT = height(DTT)
T_DTT = 248

Because each sample time has an observation for all variables, rmmissing does not remove any observations.

Determine whether the sampling timestamps have a regular frequency and are sorted.

areTimestampsRegular = isregular(DTT,"quarters")
areTimestampsRegular = logical
   0

areTimestampsSorted = issorted(DTT.Time)
areTimestampsSorted = logical
   1

areTimestampsRegular = 0 indicates that the timestamps of DTT are irregular. areTimestampsSorted = 1 indicates that the timestamps are sorted. Macroeconomic series in this example are timestamped at the end of the month. This quality induces an irregularly measured series.

Remedy the time irregularity by shifting all dates to the first day of the quarter.

dt = DTT.Time;
dt = dateshift(dt,"start","quarter");
DTT.Time = dt;
areTimestampsRegular = isregular(DTT,"quarters")
areTimestampsRegular = logical
   1

DTT is regular.

Create Model Template for Estimation

Suppose that a regression model of CPI quarterly changes onto the GDP rate, with ARMA(1,1) errors, is appropriate.

Create a model template for a regression model with ARMA(1,1) errors template. Specify the response variable name.

Mdl = regARIMA(1,0,1);
Mdl.SeriesName = "GDPRate";

Mdl is a partially specified regARIMA object.

Partiton Data

Partition the data set into estimation and forecast samples.

fh = 8;
DTTES = DTT(1:(T_DTT-fh),:);
DTTFS = DTT((T_DTT-fh+1):end,:);

Fit Model to Data

Fit a regression model with ARMA(1,1) errors to the estimation sample. Specify the entire series GDP rate and CPI quarterly changes series, and specify the predictor variable name.

EstMdl = estimate(Mdl,DTTES,PredictorVariables="CPIDel");
 
    Regression with ARMA(1,1) Error Model (Gaussian Distribution):
 
                   Value       StandardError    TStatistic      PValue  
                 __________    _____________    __________    __________

    Intercept      0.016489      0.0017307        9.5272      1.6152e-21
    AR{1}           0.57835       0.096952        5.9653      2.4415e-09
    MA{1}          -0.15125        0.11658       -1.2974         0.19449
    Beta(1)       0.0025095      0.0014147        1.7738        0.076089
    Variance     0.00011319     7.5405e-06         15.01      6.2792e-51

EstMdl is a fully specified, estimated regARIMA object. By default, estimate backcasts for the required Mdl.P = 1 presample regression model residual and sets the required Mdl.Q = 1 presample error model residual to 0.

Forecast Estimated Model

Forecast the GDP rate over a 8-quarter horizon. Use the estimation sample as a presample for the forecast.

Tbl = forecast(EstMdl,fh,Presample=DTTES,PresampleResponseVariable="GDPRate", ...
    PresamplePredictorVariables="CPIDel",InSample=DTTFS, ...
    PredictorVariables="CPIDel")
Tbl=8×7 timetable
    Time     Interval        GDP         GDPRate      CPIDel    GDPRate_Response    GDPRate_MSE    GDPRate_RegressionInnovation
    _____    ________    ___________    __________    ______    ________________    ___________    ____________________________

    Q2-07       91        0.00018278      0.018278     1.675         0.015765       0.00011319              -0.0049278         
    Q3-07       91        0.00016916      0.016916     1.359          0.01705       0.00013383                -0.00285         
    Q4-07       94        6.1286e-05     0.0061286     3.355          0.02326       0.00014074              -0.0016483         
    Q1-08       91        9.3272e-05     0.0093272      1.93         0.020379       0.00014305             -0.00095329         
    Q2-08       91        0.00011103      0.011103     3.367         0.024387       0.00014382             -0.00055134         
    Q3-08       92        8.9585e-05     0.0089585     1.641         0.020288       0.00014408             -0.00031887         
    Q4-08       92       -0.00016145     -0.016145    -7.098       -0.0015075       0.00014417             -0.00018442         
    Q1-09       90       -8.6878e-05    -0.0086878     1.137         0.019236        0.0001442             -0.00010666         

Tbl is a 8-by-7 timetable containing the forecasted responses GDPRate_Response and their forecast MSEs GDPRate_MSE, the forecasted unconditional disturbances GDPRate_RegressionInnovation, and all variables in DTTFS.

Plot the forecasts and 95% forecast intervals.

Tbl.Lower = Tbl.GDPRate_Response - 1.96*sqrt(Tbl.GDPRate_MSE);
Tbl.Upper = Tbl.GDPRate_Response + 1.96*sqrt(Tbl.GDPRate_MSE);

figure
h1 = plot(DTT.Time(end-65:end),DTT.GDPRate(end-65:end), ...
    Color=[.7,.7,.7]);
hold on
h2 = plot(Tbl.Time,Tbl.GDPRate_Response,"b",LineWidth=2);
h3 = plot(Tbl.Time,Tbl.Lower,"r:",LineWidth=2);
plot(DTTFS.Time,Tbl.Upper,"r:",LineWidth=2);
ha = gca;
title("GDP Rate Forecasts and 95% Forecast Intervals")
ph = patch([repmat(Tbl.Time(1),1,2) repmat(Tbl.Time(end),1,2)],...
    [ha.YLim fliplr(ha.YLim)],...
    [0 0 0 0],"b");
ph.FaceAlpha = 0.1;
legend([h1 h2 h3],["Observed GDP rate" "Forecasted GDP rate", ...
    "95% forecast interval"],Location="best")
axis tight
hold off

Figure contains an axes object. The axes object with title GDP Rate Forecasts and 95% Forecast Intervals contains 5 objects of type line, patch. These objects represent Observed GDP rate, Forecasted GDP rate, 95% forecast interval.

Fit a regression model with ARIMA(1,1,1) errors by regressing the quarterly log US GDP onto the log CPI. Compute MMSE forecasts of the log GDP series using the estimated model. Supply data in timetables.

Load the US macroeconomic data set. Compute the log GDP series.

load Data_USEconModel
DTT = DataTimeTable;
DTT.LogGDP = log(DTT.GDP);
T = height(DTT);

Remedy the time irregularity by shifting all dates to the first day of the quarter.

dt = DTT.Time;
dt = dateshift(dt,"start","quarter");
DTT.Time = dt;

Reserve 2 years (8 quarters) of data at the end of the series to compare against the forecasts.

numperiods = 8;
DTTES = DTT(1:(T-numperiods),:);    % Estimation sample
DTTFS = DTT((T-numperiods+1):T,:);  % Forecast horizon

Suppose that a regression model of the quarterly log GDP on CPI, with ARMA(1,1) errors, is appropriate.

Create a model template for a regression model with ARMA(1,1) errors template. Specify the response variable name.

Mdl = regARIMA(1,1,1);
Mdl.SeriesName = "LogGDP";

The intercept is not identifiable in a regression model with integrated errors. Fix its value before estimation. One way to do this is to estimate the intercept using simple linear regression. Use the estimation sample.

coeff = [ones(T-numperiods,1) DTTES.CPIAUCSL]\DTTES.LogGDP;
Mdl.Intercept = coeff(1);

Consider performing a sensitivity analysis by using a grid of intercepts.

Reserve 2 years (8 quarters) of data at the end of the series to compare against the forecasts.

numperiods = 8;
estidx = 1:(T-numperiods);     % Estimation sample 
frstHzn = (T-numperiods+1):T;  % Forecast horizon 

Fit a regression model with ARMA(1,1,1) errors to the estimation sample. Specify the predictor variable name.

EstMdl = estimate(Mdl,DTTES,PredictorVariables="CPIAUCSL");
 
    Regression with ARIMA(1,1,1) Error Model (Gaussian Distribution):
 
                   Value       StandardError    TStatistic      PValue   
                 __________    _____________    __________    ___________

    Intercept        5.8303              0           Inf                0
    AR{1}           0.92869       0.028414        32.684      2.6118e-234
    MA{1}          -0.39063       0.057599       -6.7819       1.1858e-11
    Beta(1)       0.0029335      0.0014645        2.0031         0.045166
    Variance     0.00010668     6.9256e-06        15.403       1.5539e-53

EstMdl is a fully specified, estimated regARIMA object. By default, estimate backcasts for the required Mdl.P = 2 presample regression model residual and sets the required Mdl.Q = 1 presample error model residual to 0.

Infer estimation sample unconditional disturbances to initialize the model for forecasting. Specify the predictor variable name.

Tbl0 = infer(EstMdl,DTTES,PredictorVariables="CPIAUCSL");

Forecast the estimated model over an 8-quarter horizon. Use the inferred unconditional disturbances as presample data. Specify the forecast sample predictor data and its variable name, and specify the presample unconditional disturbance variable name.

Tbl = forecast(EstMdl,numperiods,Presample=Tbl0, ...
   PresampleRegressionDisturbanceVariable="LogGDP_RegressionResidual", ...
   InSample=DTTFS,PredictorVariables="CPIAUCSL");

Plot the forecasted log GDP with approximate 95% forecast intervals. Also, separately plot the unconditional disturbances.

Tbl.Lower = Tbl.LogGDP_Response - 1.96*sqrt(Tbl.LogGDP_MSE);
Tbl.Upper = Tbl.LogGDP_Response + 1.96*sqrt(Tbl.LogGDP_MSE);
figure
tiledlayout(2,1)
nexttile
plot(DTT.Time(end-40:end),DTT.LogGDP(end-40:end),Color=[.7,.7,.7])
hold on
h1 = plot(Tbl.Time,[Tbl.Lower Tbl.Upper],"r:",LineWidth=2);
h2 = plot(Tbl.Time,Tbl.LogGDP_Response,"k",LineWidth=2);
h = gca;
ph = patch([repmat(Tbl.Time(1),1,2) repmat(Tbl.Time(end),1,2)], ...
   [h.YLim fliplr(h.YLim)],[0 0 0 0],"b");
ph.FaceAlpha = 0.1;
legend([h1(1) h2],["95% percentile intervals" "MMSE forecast"], ...
    Location="northwest")
axis tight
grid on
title("Log GDP Forecast Over 2-year Horizon")
hold off
nexttile
plot(DTT.Time,[Tbl0.LogGDP_RegressionResidual; Tbl.LogGDP_RegressionInnovation])
hold on
h = gca;
ph = patch([repmat(Tbl.Time(1),1,2) repmat(Tbl.Time(end),1,2)], ...
   [h.YLim fliplr(h.YLim)],[0 0 0 0],"b");
ph.FaceAlpha = 0.1;
axis tight
grid on
title("Unconditional Disturbances")
hold off

Figure contains 2 axes objects. Axes object 1 with title Log GDP Forecast Over 2-year Horizon contains 5 objects of type line, patch. These objects represent 95% percentile intervals, MMSE forecast. Axes object 2 with title Unconditional Disturbances contains 2 objects of type line, patch.

The unconditional disturbances, ut, are nonstationary, therefore the widths of the forecast intervals grow with time.

Input Arguments

collapse all

Fully specified regression model with ARIMA errors, specified as a regARIMA model object created by regARIMA or estimate.

The properties of Mdl cannot contain NaN values.

Forecast horizon, or the number of time points in the forecast period, specified as a positive integer.

Data Types: double

Since R2023b

Presample data containing presample responses yt, predictors xt, unconditional disturbances ut, or error model innovations εt, to initialize the model, specified as a table or timetable with numprevars variables and numpreobs rows. You can select a response, error model innovation, unconditional disturbance, or multiple predictor variables from Presample by using the PresampleResponseVariable, PresampleErrorInnovationVariable, PresampleRegressionDisturbanceVariable, or PresamplePredictorVariables name-value argument, respectively.

numpreobs is the number of presample observations. numpaths is the maximum number of independent presample paths among the specified variables, from which forecast initializes the resulting numpaths forecasts (see Algorithms).

For all selected variables except predictor variables, each variable contains a single path (numpreobs-by-1 vector) or multiple paths (numpreobs-by-numpaths matrix) of presample response, error model innovation, or unconditional disturbance data.

Each selected predictor variable contains a single path of observations. forecast applies all selected predictor variables to each forecasted path.

Each row is a presample observation, and measurements in each row occur simultaneously. The last row contains the latest presample observation. forecast uses only the latest required rows. For more details, see Time Base Partitions for Forecasting.

Presample unconditional disturbances ut are required to initialize the error model for forecasting. You can specify presample unconditional disturbances in one of the following ways:

  • Specify numpreobsMdl.P presample response and predictor data to enable forecast to infer presample unconditional disturbances.

  • Specify numpreobsMdl.P presample unconditional disturbances without presample error model innovations. forecast ignores specified presample response and predictor data.

  • Specify numpreobsMdl.Q presample error model innovations without presample unconditional disturbances. forecast ignores specified presample response and predictor data.

  • Specify numpreobsmax(Mdl.P,Mdl.Q) presample error model innovations and unconditional disturbances only. forecast ignores specified presample response and predictor data.

If Presample is a timetable, all the following conditions must be true:

  • Presample must represent a sample with a regular datetime time step (see isregular).

  • The datetime vector of sample timestamps Presample.Time must be ascending or descending.

If Presample is a table, the last row contains the latest presample observation.

By default, forecast sets all necessary presample unconditional disturbances in one of the following ways:

  • If forecast cannot infer enough unconditional disturbances from specified presample response and predictor data, forecast sets all necessary presample unconditional disturbances to zero.

  • If you specify at least Mdl.P + Mdl.Q presample unconditional disturbances, forecast infers all necessary presample error model innovations from the specified presample unconditional disturbances. Otherwise, forecast sets all necessary presample error model innovations to zero.

Since R2023b

Presample unconditional disturbance variable ut to select from Presample containing presample unconditional disturbance data, specified as one of the following data types:

  • String scalar or character vector containing a variable name in Presample.Properties.VariableNames

  • Variable index (positive integer) to select from Presample.Properties.VariableNames

  • A logical vector, where PresampleRegressionDisturbanceVariable(j) = true selects variable j from Presample.Properties.VariableNames

The selected variable must be a numeric vector and cannot contain missing values (NaNs).

If you specify presample unconditional disturbance data in Presample, you must specify PresampleRegressionDisturbanceVariable.

Example: PresampleRegressionDisturbanceVariable="StockRateU0"

Example: PresampleRegressionDisturbanceVariable=[false false true false] or PresampleRegressionDisturbanceVariable=3 selects the third table variable as the presample unconditional disturbance variable.

Data Types: double | logical | char | cell | string

Since R2023b

Forecasted (future) predictor data for the model regression component, specified as a table or timetable. InSample contains numvars variables, including numpreds predictor variables xt.

forecast returns the forecasted variables in the output table or timetable Tbl, which is commensurate with InSample.

Each row corresponds to an observation in the forecast horizon, the first row is the earliest observation, and measurements in each row, among all paths, occur simultaneously. InSample must have at least numperiods rows to cover the forecast horizon. If you supply more rows than necessary, forecast uses only the first numperiods rows.

Each selected predictor variable is a numeric vector without missing values (NaNs). forecast applies the specified predictor variables to all forecasted paths.

If InSample is a timetable, the following conditions apply:

  • InSample must represent a sample with a regular datetime time step (see isregular).

  • The datetime vector InSample.Time must be ascending or descending.

  • Presample must immediately precede InSample, with respect to the sampling frequency.

If InSample is a table, the last row contains the latest observation.

By default, forecast does not include the regression component in the model, regardless of the value of Mdl.Beta.

Since R2023b

Predictor variables xt to select from InSample containing predictor data for the model regression component in the forecast horizon, specified as one of the following data types:

  • String vector or cell vector of character vectors containing numpreds variable names in InSample.Properties.VariableNames

  • A vector of unique indices (positive integers) of variables to select from InSample.Properties.VariableNames

  • A logical vector, where PredictorVariables(j) = true selects variable j from InSample.Properties.VariableNames

The selected variables must be numeric vectors and cannot contain missing values (NaNs).

By default, forecast excludes the regression component, regardless of its presence in Mdl.

Example: PredictorVariables=["M1SL" "TB3MS" "UNRATE"]

Example: PredictorVariables=[true false true false] or PredictorVariable=[1 3] selects the first and third table variables to supply the predictor data.

Data Types: double | logical | char | cell | string

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: For example, forecast(Mdl,10,Y0=y0,X0=Pred0,XF=Pred) specifies the presample response path y0, and the presample and forecast sample predictor data Pred0 and Pred, respectively, to forecast a model with a regression component.

Presample response data yt to infer presample unconditional disturbances ut, specified as a numpreobs-by-1 numeric column vector or a numpreobs-by-numpaths numeric matrix. When you supply Y0, supply all optional data as numeric arrays, and forecast returns results in numeric arrays.

Presample unconditional disturbances ut are required to initialize the error model for forecasting. forecast infers presample unconditional disturbances from Y0 and specified presample predictor data X0. Therefore, if you specify presample unconditional disturbances U0, forecast ignores Y0 and X0.

numpreobs is the number of presample observations. numpaths is the number of independent presample paths, from which forecast initializes the resulting numpaths forecasts (see Algorithms).

Each row is a presample observation, and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs must be at least Mdl.P to initialize the model. If numpreobs > Mdl.P, forecast uses only the latest Mdl.P rows. For more details, see Time Base Partitions for Forecasting.

Columns of Y0 correspond to separate, independent presample paths.

  • If Y0 is a column vector, it represents a single path of the response series. forecast applies it to each forecasted path. In this case, all forecast paths Y derive from the same initial responses.

  • If Y0 is a matrix, each column represents a presample path of the response series. numpaths is the maximum among the second dimensions of the specified presample observation matrices Y0, E0, and U0.

By default, forecast defers to specified or default presample unconditional disturbances U0.

Data Types: double

Presample predictor data xt used to infer the presample unconditional disturbances ut, specified as a numpreobs-by-numpreds numeric matrix. Use X0 only when you supply the numeric array of presample response data Y0 and your model contains a regression component. numpreds = numel(Mdl.Beta).

Presample unconditional disturbances ut are required to initialize the error model for forecasting. forecast infers presample unconditional disturbances from X0 and specified presample response data Y0. Therefore, if you specify presample unconditional disturbances U0, forecast ignores Y0 and X0.

Each row is a presample observation, and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs must be at least Mdl.P to initialize the model. If numpreobs > Mdl.P, forecast uses only the latest Mdl.P rows. For more details, see Time Base Partitions for Forecasting.

Each column is an individual predictor variable. forecast applies X to each path; that is, X represents one path of observed predictors.

If you specify X0 but you do not specify forecasted predictor data XF, forecast issues an error.

By default, forecast drops the regression component from the model when it infers presample unconditional disturbances, regardless of the value of the regression coefficient Mdl.Beta.

Data Types: double

Presample unconditional disturbance data ut to initialize the autoregressive (AR) component of the ARIMA error model, specified as a numpreobs-by-1 numeric column vector or a numpreobs-by-numpaths numeric matrix. When you supply U0, supply all optional data as numeric arrays, and forecast returns results in numeric arrays.

Each row is a presample observation, and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs must be at least Mdl.P to initialize the model. If numpreobs > Mdl.P, forecast uses only the latest Mdl.P rows. For more details, see Time Base Partitions for Forecasting.

Columns of U0 correspond to separate, independent presample paths.

  • If U0 is a column vector, it represents a single path of the unconditional disturbance series. forecast applies it to each forecasted path. In this case, all forecasted paths derive from the same initial responses.

  • If U0 is a matrix, each column represents a presample path of the unconditional disturbance series. numpaths is the maximum among the second dimensions of the specified presample observation matrices Y0, E0, and U0.

By default, if the presample data (Y0 and X0) contains at least Mdl.P rows, forecast infers U0 from the presample data. If you do not specify presample data, then all required presample unconditional disturbances are zero.

Data Types: double

Presample error model innovation data εt used to initialize either the moving average (MA) component of the ARIMA error model, specified as a numpreobs-by-1 column vector or numpreobs-by-numpaths numeric matrix. Use E0 only when you supply the numeric array of presample response data Y0. forecast assumes that the presample innovations have a mean of zero.

Each row is a presample observation, and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs must be at least Mdl.Q to initialize the model. If numpreobs is greater than required, forecast uses only the latest required rows.

Columns of E0 correspond to separate, independent presample paths.

  • If E0 is a column vector, it represents a single path of the innovation series. forecast applies it to each forecasted path. In this case, all forecasts derive from the same initial error model innovations.

  • If E0 is a matrix, each column represents a presample path of the error model innovation series. numpaths is the maximum among the second dimensions of the specified presample observation matrices Y0, U0, and U0.

By default, if U0 contains at least Mdl.P + Mdl.Q rows, forecast infers E0 from U0. If U0 has an insufficient number of rows and forecast cannot infer sufficient observations of U0 from the presample data (Y0 and X0), forecast sets necessary presample error model innovations to zero.

Data Types: double

Since R2023b

Response variable yt to select from Presample containing the presample response data, specified as one of the following data types:

  • String scalar or character vector containing a variable name in Presample.Properties.VariableNames

  • Variable index (positive integer) to select from Presample.Properties.VariableNames

  • A logical vector, where PreampleResponseVariable(j) = true selects variable j from Presample.Properties.VariableNames

forecast uses specified presample response and predictor data to infer presample unconditional disturbances. If you specify enough presample unconditional disturbances or error model innovations by using Presample and PresampleRegressionDisturbanceVariable or PresampleInnovationVariable, forecast ignores PresamplePredictorVariables and PresampleResponseVariable.

The selected variable must be a numeric vector and cannot contain missing values (NaNs).

If you specify presample response data by using the Presample name-value argument, you must specify PresampleResponseVariable.

Example: PresampleResponseVariable="StockRate"

Example: PresampleResponseVariable=[false false true false] or PresampleResponseVariable=3 selects the third table variable as the response variable.

Data Types: double | logical | char | cell | string

Since R2023b

Presample predictor variables xt to select from Presample containing presample predictor data for the regression component in the presample period, specified as one of the following data types:

  • String vector or cell vector of character vectors containing numpreds variable names in Presample.Properties.VariableNames

  • A vector of unique indices (positive integers) of variables to select from Presample.Properties.VariableNames

  • A logical vector, where PresamplePredictorVariables(j) = true selects variable j from Presample.Properties.VariableNames

forecast uses specified presample response and predictor data to infer presample unconditional disturbances. If you specify enough presample unconditional disturbances or error model innovations by using Presample and PresampleRegressionDisturbanceVariable or PresampleInnovationVariable, forecast ignores PresamplePredictorVariables and PresampleResponseVariable.

The selected variables must be numeric vectors and cannot contain missing values (NaNs).

If you specify presample predictor data, you must also specify in-sample predictor data by using the InSample and PredictorVariables name-value arguments.

By default, forecast excludes the regression component, regardless of its presence in Mdl.

Example: PresamplePredictorVariables=["M1SL" "TB3MS" "UNRATE"]

Example: PresamplePredictorVariables=[true false true false] or PredictorVariable=[1 3] selects the first and third table variables to supply the predictor data.

Data Types: double | logical | char | cell | string

Since R2023b

Presample error model innovation variable of εt to select from Presample containing presample error model innovation data, specified as one of the following data types:

  • String scalar or character vector containing a variable name in Presample.Properties.VariableNames

  • Variable index (positive integer) to select from Presample.Properties.VariableNames

  • A logical vector, where PresampleInnovationVariable(j) = true selects variable j from Presample.Properties.VariableNames

The selected variable must be a numeric matrix and cannot contain missing values (NaNs).

If you specify presample error model innovation data in Presample, you must specify PresampleInnovationVariable.

Example: PresampleInnovationVariable="StockRateDist0"

Example: PresampleInnovationVariable=[false false true false] or PresampleInnovationVariable=3 selects the third table variable as the presample error model innovation variable.

Data Types: double | logical | char | cell | string

Forecasted (or future) predictor data, specified as a numeric matrix with numpreds columns. XF represents the evolution of specified presample predictor data X0 forecasted into the future (the forecast period). Use XF only when you supply the numeric array of presample response and predictor data Y0 and X0, respectively.

Rows of XF correspond to time points in the future; XF(t,:) contains the t-period-ahead predictor forecasts. XF must have at least numperiods rows. If the number of rows exceeds numperiods, forecast uses only the first (earliest) numperiods forecasts. For more details, see Time Base Partitions for Forecasting.

Columns of XF are separate time series variables, and they correspond to the columns of X0 and Mdl.Beta.

forecast treats XF as a fixed (nonstochastic) matrix.

By default, the forecast function generates forecasts from Mdl without a regression component, regardless of the value of the regression coefficient Mdl.Beta.

Note

  • NaN values in X0, Y0, U0, E0, and XF indicate missing values. forecast removes missing values from specified data by list-wise deletion.

    • For the presample, forecast horizontally concatenates the possibly jagged arrays X0, Y0, U0, and E0 with respect to the last rows, and then it removes any row of the concatenated matrix containing at least one NaN.

    • For in-sample data, forecast removes any row of XF containing at least one NaN.

    This type of data reduction reduces the effective sample size and can create an irregular time series.

  • For numeric data inputs, forecast assumes that you synchronize the presample data such that the latest observations occur simultaneously.

  • forecast issues an error when any table or timetable input contains missing values.

  • Set presample response and predictor data to the same response and predictor data as used in the estimation, simulation, or inference of Mdl. This assignment ensures correct inference of the required presample unconditional disturbances.

  • To include a regression component in the response forecast, you must specify the forecasted predictor data. You can specify forecasted predictor data without also specifying presample predictor data, but forecast issues an error when you specify presample predictor data without also specifying forecasted predictor data.

Output Arguments

collapse all

MMSE forecasted responses yt, returned as a numperiods-by-1 column vector or a numperiods-by-numpaths numeric matrix. Y represents a continuation of Y0 (Y(1,:) occurs in the time point immediately after Y0(end,:)). forecast returns Y by default and when you supply optional data presample data in numeric arrays.

Y(t,:) contains the t-period-ahead forecasts, or the forecast of all paths for time point t in the forecast period.

forecast determines numpaths from the number of columns in the presample data sets Y0, E0, and U0. For details, see Algorithms. If each presample data set has one column, Y is a column vector.

Data Types: double

MSE of the forecasted responses Y (forecast error variances), returned as a numperiods-by-1 column vector or a numperiods-by-numpaths numeric matrix. forecast returns YMSE by default and when you supply optional data presample data in numeric arrays.

YMSE(t,:) contains the forecast error variances of all paths for time point t in the forecast period.

forecast determines numpaths from the number of columns in the presample data sets Y0, E0, and U0. For details, see Algorithms. If you do not specify any presample data sets, or if each data set is a column vector, YMSE is a column vector.

The square roots of YMSE are the standard errors of the forecasts Y.

Data Types: double

MMSE forecasts of ARIMA error model unconditional disturbances, returned as a numperiods-by-1 column vector or a numperiods-by-numpaths numeric matrix. U represents a continuation of U0 (U(1,:) occurs in the time point immediately after U0(end,:)). forecast returns U by default and when you supply optional data presample data in numeric arrays.

U(t,:) contains the t-period-ahead forecasted unconditional disturbances, or the conditional mean forecast of the error model over all paths for time point t in the forecast period.

forecast determines numpaths from the number of columns in the presample data sets Y0, E0, and U0. For details, see Algorithms.

Data Types: double

Since R2023b

Paths of MMSE forecasts of responses yt, corresponding forecast MSEs, and MMSE forecasts of unconditional disturbances ut, returned as a table or timetable, the same data type as Presample or InSample. forecast returns Tbl only when you supply Presample or InSample.

Tbl contains the following variables:

  • The forecasted response paths, which are in a numperiods-by-numpaths numeric matrix, with rows representing periods in the forecast horizon and columns representing independent paths, each corresponding to the input presample paths in Presample or preceding the in-sample period in InSample. forecast names the forecasted response variable responseName_Response, where responseName is Mdl.SeriesName. For example, if Mdl.SeriesName is GDP, Tbl contains a variable for the corresponding forecasted response paths with the name GDP_Response.

    Each path in Tbl.responseName_Response represents the continuation of the corresponding presample response path in Presample (Tbl.responseName_Response(1,:) occurs in the next time point, with respect to the periodicity Presample, after the last presample response). Tbl.responseName_Response(j,k) contains the j-period-ahead forecasted response of path k.

  • The forecast MSE paths, which are in a numperiods-by-numpaths numeric matrix, with rows representing periods in the forecast horizon and columns representing independent paths, each corresponding to the forecasted responses in Tbl.responseName_Response. forecast names the forecast MSEs responseName_MSE, where responseName is Mdl.SeriesName. For example, if Mdl.SeriesName is GDP, Tbl contains a variable for the corresponding forecast MSE with the name GDP_MSE.

  • The forecasted unconditional disturbance paths, which are in a numperiods-by-numpaths numeric matrix, with rows representing periods in the forecast horizon and columns representing independent paths. forecast names the forecasted unconditional disturbance variable responseName_RegressionInnovation, where responseName is Mdl.SeriesName. For example, if Mdl.SeriesName is GDP, Tbl contains a variable for the corresponding forecasted unconditional disturbance paths with the name GDP_RegressionInnovation.

    Each path in Tbl.responseName_RegressionInnovation represents a continuation of the presample unconditional disturbance process, either supplied by or inferred from Presample, or set by default (Tbl.responseName_RegressionInnovation(1,:) occurs in the next time point, with respect to the periodicity Presample, after the last presample unconditional disturbance). Tbl.responseName_RegressionInnovation(j,k) contains the j-period-ahead forecasted unconditional disturbance of path k.

  • When you supply InSample, Tbl contains all variables in InSample.

If Presample is a timetable, the following conditions hold:

  • The row order of Tbl, either ascending or descending, matches the row order of Presample.

  • Tbl.Time(1) is the next time after Presample.Time(end) relative the sampling frequency, and Tbl.Time(2:numobs) are the following times relative to the sampling frequency.

More About

collapse all

Time Base Partitions for Forecasting

Time base partitions for forecasting are two disjoint, contiguous intervals of the time base; each interval contains time series data for forecasting a dynamic model. The forecast period (forecast horizon) is a numperiods length partition at the end of the time base during which forecast generates forecasts Y from the dynamic model Mdl. The presample period is the entire partition occurring before the forecast period. forecast can require observed responses Y0, regression data X0, unconditional disturbances U0, or innovations E0 in the presample period to initialize the dynamic model for forecasting. The model structure determines the types and amounts of required presample observations.

A common practice is to fit a dynamic model to a portion of the data set, then validate the predictability of the model by comparing its forecasts to observed responses. During forecasting, the presample period contains the data to which the model is fit, and the forecast period contains the holdout sample for validation. Suppose that yt is an observed response series; x1,t, x2,t, and x3,t are observed exogenous series; and time t = 1,…,T. Consider forecasting responses from a dynamic model of yt containing a regression component numperiods = K periods. Suppose that the dynamic model is fit to the data in the interval [1,TK] (for more details, see estimate). This figure shows the time base partitions for forecasting.

Time series plot showing the data for yt, x1t, x2t, and x3t over the presample period and forecast period.

For example, to generate forecasts Y from a regression model with AR(2) errors, forecast requires presample unconditional disturbances U0 and future predictor data XF.

  • forecast infers unconditional disturbances given enough readily available presample responses and predictor data. To initialize an AR(2) error model, Y0 = [yTK1yTK] and X0 = [x1,TK1x2,TK1x3,TK1x1,TK1x2,TKx3,TK].

  • To model, forecast requires future exogenous data XF = [x1,(TK+1):Tx2,(TK+1):Tx3,(TK+1):T].

This figure shows the arrays of required observations for the general case, with corresponding input and output arguments.

Y0 forecast and X0 predictor data in Presample and Y forecast and corresponding XF predictor data are shown for Forecast sample

Algorithms

  • The forecast function sets the number of sample paths numpaths to the maximum number of columns among the specified presample data sets:

    All specified presample data sets must have either one column or numpaths > 1 columns. Otherwise, forecast issues an error. For example, if you supply Y0 and E0, and Y0 has five columns representing five paths, then E0 can have one column or five columns. If E0 has one column, forecast applies E0 to each path.

  • forecast computes the forecasted response MSEs by treating the predictor data matrices as nonstochastic and statistically independent of the model innovations. Therefore, the forecast MSEs reflect the variances associated with the unconditional disturbances of the ARIMA error model alone.

  • forecast uses presample response and predictor data to infer presample unconditional disturbances. Therefore, if you specify presample unconditional disturbances, forecast ignores any specified presample response and predictor data.

References

[1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.

[2] Davidson, R., and J. G. MacKinnon. Econometric Theory and Methods. Oxford, UK: Oxford University Press, 2004.

[3] Enders, Walter. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, Inc., 1995.

[4] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.

[5] Pankratz, A. Forecasting with Dynamic Regression Models. John Wiley & Sons, Inc., 1991.

[6] Tsay, R. S. Analysis of Financial Time Series. 2nd ed. Hoboken, NJ: John Wiley & Sons, Inc., 2005.

Version History

Introduced in R2013b

expand all