forecast
Class: regARIMA
Forecast responses of regression model with ARIMA errors
Syntax
[Y,YMSE]
= forecast(Mdl,numperiods)
[Y,YMSE,U]
= forecast(Mdl,numperiods)
[Y,YMSE,U]
= forecast(Mdl,numperiods,Name,Value)
Description
[
forecasts responses (Y
,YMSE
]
= forecast(Mdl
,numperiods
)Y
) for a regression model with ARIMA time series
errors and generates corresponding mean square errors (YMSE
).
[
additionally forecasts unconditional disturbances for a regression model with ARIMA
errors.Y
,YMSE
,U
]
= forecast(Mdl
,numperiods
)
[
forecasts with additional options specified by one or more
Y
,YMSE
,U
]
= forecast(Mdl
,numperiods
,Name,Value
)Name,Value
pair arguments.
Input Arguments
numperiods
— Forecast horizon
positive integer
Forecast horizon, or the number of time points in the forecast period, specified as a positive integer.
Data Types: double
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
E0
— Presample innovations
numeric column vector | numeric matrix
Presample innovations that initialize the moving average (MA)
component of the ARIMA error model, specified as the comma-separated
pair consisting of 'E0'
and a numeric column vector
or numeric matrix. forecast
assumes that the
presample innovations have a mean of 0.
If
E0
is a column vector, thenforecast
applies it to each forecasted path.If
E0
,Y0
, andU0
are matrices with multiple paths, then they must have the same number of columns.E0
requires at leastMdl.Q
rows. IfE0
contains extra rows, thenforecast
uses the latest presample innovations. The last row contains the latest presample innovation.
By default, if U0
contains at least
Mdl.P
+ Mdl.Q
rows, then
forecast
infers E0
from
U0
. If U0
has an insufficient
number of rows, and forecast
cannot infer sufficient
observations of U0
from the presample data
(Y0
and X0
), then
E0
is 0.
Data Types: double
U0
— Presample unconditional disturbances
numeric column vector | numeric matrix
Presample unconditional disturbances that initialize the
autoregressive (AR) component of the ARIMA error model, specified as the
comma-separated pair consisting of 'U0'
and a numeric
column vector or numeric matrix. If you do not specify presample
innovations E0
, forecast
uses U0
to infer them.
If
U0
is a column vector, thenforecast
applies it to each forecasted path.If
U0
,Y0
, andE0
are matrices with multiple paths, then they must have the same number of columns.U0
requires at leastMdl.P
rows. IfU0
contains extra rows, thenforecast
uses the latest presample unconditional disturbances. The last row contains the latest presample unconditional disturbance.
By default, if the presample data (Y0
and
X0
) contains at least Mdl.P
rows, then forecast
infers U0
from
the presample data. If you do not specify presample data, then all
required presample unconditional disturbances are 0.
Data Types: double
X0
— Presample predictor data
numeric matrix
Presample predictor data that initializes the model for forecasting,
specified as the comma-separated pair consisting of
'X0'
and a numeric matrix. The columns of
X0
are separate time series variables.
forecast
uses X0
to infer
presample unconditional disturbances U0
. Therefore,
if you specify U0
, forecast
ignores X0
.
If you do not specify
U0
, thenX0
requires at leastMdl.P
rows to inferU0
. IfX0
contains extra rows, thenforecast
uses the latest observations. The last row contains the latest observation of each series.X0
requires the same number of columns as the length ofMdl.Beta
.If you specify
X0
, then you must also specifyXF
.forecast
treatsX0
as a fixed (nonstochastic) matrix.
Data Types: double
XF
— Forecasted or future predictor data
numeric matrix
Forecasted or future predictor data, specified as the comma-separated
pair consisting of 'XF'
and a numeric matrix.
The columns of XF
are separate time series, each
corresponding to forecasts of the series in X0
. Row
t of XF
contains the
t-period-ahead forecasts of
X0
.
If you specify X0
, then you must also specify
XF
. XF
and
X0
require the same number of columns.
XF
must have at least
numperiods
rows. If XF
exceeds
numperiods
rows, then forecast
uses the first numperiods
forecasts.
forecast
treats XF
as a fixed
(nonstochastic) matrix.
By default, forecast
does not include a regression
component in the model, regardless of the presence of regression
coefficients in Mdl
.
Data Types: double
Y0
— Presample response data
numeric column vector | numeric matrix
Presample response data that initializes the model for forecasting,
specified as the comma-separated pair consisting of
'Y0'
and a numeric column vector or numeric
matrix. forecast
uses Y0
to
infer presample unconditional disturbances U0
.
Therefore, if you specify U0
,
forecast
ignores Y0
.
If
Y0
is a column vector,forecast
applies it to each forecasted path.If
Y0
,E0
, andU0
are matrices with multiple paths, then they must have the same number of columns.If you do not specify
U0
, thenY0
requires at leastMdl.P
rows to inferU0
. IfY0
contains extra rows, thenforecast
uses the latest observations. The last row contains the latest observation.
Data Types: double
Notes
NaN
s inE0
,U0
,X0
,XF
, andY0
indicate missing values andforecast
removes them. The software merges the presample data sets (E0
,U0
,X0
, andY0
), then uses list-wise deletion to remove anyNaN
s.forecast
similarly removesNaN
s fromXF
. RemovingNaN
s in the data reduces the sample size. Such removal can also create irregular time series.forecast
assumes that you synchronize presample data such that the latest observation of each presample series occurs simultaneously.Set
X0
to the same predictor matrix asX
used in the estimation, simulation, or inference ofMdl
. This assignment ensures correct inference of the unconditional disturbances,U0
.To include a regression component in the response forecast, you must specify the forecasted predictor data
XF
. That is, you can specifyXF
without also specifyingX0
, butforecast
issues an error when you specifyX0
without also specifyingXF
.
Output Arguments
Y
— Minimum mean square error forecasts of response data
numeric matrix
Minimum mean square error (MMSE) forecasts of the response data, returned
as a numeric matrix. Y
has numperiods
rows and numPaths
columns.
If you do not specify
Y0
,E0
, andU0
, thenY
is anumperiods
column vector.If you specify
Y0
,E0
, andU0
, all havingnumPaths
columns, thenY
is anumperiods
-by-numPaths
matrix.Row i of
Y
contains the forecasts for the ith period.
Data Types: double
YMSE
— Mean square errors of forecasted responses
numeric matrix
Mean square errors (MSEs) of the forecasted responses, returned as a
numeric matrix. YMSE
has numperiods
rows and numPaths
columns.
If you do not specify
Y0
,E0
, andU0
, thenYMSE
is anumperiods
column vector.If you specify
Y0
,E0
, andU0
, all havingnumPaths
columns, thenYMSE
is anumperiods
-by-numPaths
matrix.Row i of
YMSE
contains the forecast error variances for the ith period.The predictor data does not contribute variability to
YMSE
becauseforecast
treatsXF
as a nonstochastic matrix.The square roots of
YMSE
are the standard errors of the forecasts ofY
.
Data Types: double
U
— Minimum mean square error forecasts of future ARIMA error model unconditional disturbances
numeric matrix
Minimum mean square error (MMSE) forecasts of future ARIMA error model
unconditional disturbances, returned as a numeric matrix.
U
has numperiods
rows and
numPaths
columns.
If you do not specify
Y0
,E0
, andU0
, thenU
is anumperiods
column vector.If you specify
Y0
,E0
, andU0
, all havingnumPaths
columns, thenU
is anumperiods
-by-numPaths
matrix.Row i of
U
contains the forecasted unconditional disturbances for the ith period.
Data Types: double
Examples
Forecast Responses of a Regression Model with ARIMA Errors
Forecast responses from the following regression model with ARMA(2,1) errors over a 30-period horizon:
where is Gaussian with variance 0.1.
Specify the model. Simulate responses from the model and two predictor series.
Mdl0 = regARIMA('Intercept',0,'AR',{0.5 -0.8},... 'MA',-0.5,'Beta',[0.1 -0.2],'Variance',0.1); rng(1); % For reproducibility X = randn(130,2); y = simulate(Mdl0,130,'X',X);
Fit the model to the first 100 observations, and reserve the remaining 30 observations to evaluate forecast performance.
Mdl = regARIMA('ARLags',1:2); EstMdl = estimate(Mdl,y(1:100),'X',X(1:100,:));
Regression with ARMA(2,0) Error Model (Gaussian Distribution): Value StandardError TStatistic PValue ________ _____________ __________ __________ Intercept 0.004358 0.021314 0.20446 0.83799 AR{1} 0.36833 0.067103 5.4891 4.0408e-08 AR{2} -0.75063 0.090865 -8.2609 1.4453e-16 Beta(1) 0.076398 0.023008 3.3205 0.00089863 Beta(2) -0.1396 0.023298 -5.9919 2.0741e-09 Variance 0.079876 0.01342 5.9522 2.6453e-09
EstMdl
is a new regARIMA
model containing the estimates. The estimates are close to their true values.
Use EstMdl
to forecast a 30-period horizon. Visually compare the forecasts to the holdout data using a plot.
[yF,yMSE] = forecast(EstMdl,30,'Y0',y(1:100),... 'X0',X(1:100,:),'XF',X(101:end,:)); figure plot(y,'Color',[.7,.7,.7]); hold on plot(101:130,yF,'b','LineWidth',2); plot(101:130,yF+1.96*sqrt(yMSE),'r:',... 'LineWidth',2); plot(101:130,yF-1.96*sqrt(yMSE),'r:','LineWidth',2); h = gca; ph = patch([repmat(101,1,2) repmat(130,1,2)],... [h.YLim fliplr(h.YLim)],... [0 0 0 0],'b'); ph.FaceAlpha = 0.1; legend('Observed','Forecast',... '95% Forecast Interval','Location','Best'); title(['30-Period Forecasts and Approximate 95% '... 'Forecast Intervals']) axis tight hold off
Many observations in the holdout sample fall beyond the 95% forecast intervals. Two reasons for this are:
The predictors are randomly generated in this example.
estimate
treats the predictors as fixed. The 95% forecast intervals based on the estimates fromestimate
do not account for the variability in the predictors.By shear chance, the estimation period seems less volatile than the forecast period.
estimate
uses the less volatile estimation period data to estimate the parameters. Therefore, forecast intervals based on the estimates should not cover observations that have an underlying innovations process with larger variability.
Forecast the GDP Using Regression Model with ARMA Errors
Forecast stationary, log GDP using a regression model with ARMA(1,1) errors, including CPI as a predictor.
Load the U.S. macroeconomic data set and preprocess the data.
load Data_USEconModel; logGDP = log(DataTimeTable.GDP); dlogGDP = diff(logGDP); % For stationarity dCPI = diff(DataTimeTable.CPIAUCSL); % For stationarity numObs = length(dlogGDP); gdp = dlogGDP(1:end-15); % Estimation sample cpi = dCPI(1:end-15); T = length(gdp); % Effective sample size frstHzn = T+1:numObs; % Forecast horizon hoCPI = dCPI(frstHzn); % Holdout sample dts = DataTimeTable.Time(2:end);
Fit a regression model with ARMA(1,1) errors.
Mdl = regARIMA('ARLags',1,'MALags',1); EstMdl = estimate(Mdl,gdp,'X',cpi);
Regression with ARMA(1,1) Error Model (Gaussian Distribution): Value StandardError TStatistic PValue __________ _____________ __________ __________ Intercept 0.014793 0.0016289 9.0818 1.0684e-19 AR{1} 0.57601 0.10009 5.7548 8.6755e-09 MA{1} -0.15258 0.11978 -1.2738 0.20272 Beta(1) 0.0028972 0.0013989 2.071 0.038355 Variance 9.5734e-05 6.5562e-06 14.602 2.723e-48
Forecast the GDP rate over a 15-quarter horizon. Use the estimation sample as a presample for the forecast.
[gdpF,gdpMSE] = forecast(EstMdl,15,'Y0',gdp,... 'X0',cpi,'XF',hoCPI);
Plot the forecasts and 95% forecast intervals.
figure h1 = plot(dts(end-65:end),dlogGDP(end-65:end),... 'Color',[.7,.7,.7]); datetick hold on h2 = plot(dts(frstHzn),gdpF,'b','LineWidth',2); h3 = plot(dts(frstHzn),gdpF+1.96*sqrt(gdpMSE),'r:',... 'LineWidth',2); plot(dts(frstHzn),gdpF-1.96*sqrt(gdpMSE),'r:','LineWidth',2); ha = gca; title('{\bf GDP Rate Forecasts and Approximate 95% Intervals}') ph = patch([repmat(dts(frstHzn(1)),1,2) repmat(dts(frstHzn(end)),1,2)],... [ha.YLim fliplr(ha.YLim)],... [0 0 0 0],'b'); ph.FaceAlpha = 0.1; legend([h1 h2 h3],{'Observed GDP rate','Forecasted GDP rate ',... '95% Forecast Interval'},'Location','Best','AutoUpdate','off'); axis tight hold off
Forecast Regression Model with ARIMA Errors With Known Intercept
Forecast unit root nonstationary, log GDP using a regression model with ARIMA(1,1,1) errors, including CPI as a predictor and a known intercept.
Load the U.S. Macroeconomic data set and preprocess the data.
load Data_USEconModel; numObs = length(DataTimeTable.GDP); logGDP = log(DataTimeTable.GDP(1:end-15)); cpi = DataTimeTable.CPIAUCSL(1:end-15); T = length(logGDP); % Effective sample size frstHzn = T+1:numObs; % Forecast horizon hoCPI = DataTimeTable.CPIAUCSL(frstHzn); % Holdout sample dt = DataTimeTable.Time;
Specify the model for the estimation period.
Mdl = regARIMA('ARLags',1,'MALags',1,'D',1);
The intercept is not identifiable in a model with integrated errors, so fix its value before estimation. One way to do this is to estimate the intercept using simple linear regression.
Reg4Int = [ones(T,1), cpi]\logGDP; intercept = Reg4Int(1);
Consider performing a sensitivity analysis by using a grid of intercepts.
Set the intercept and fit the regression model with ARIMA(1,1,1) errors.
Mdl.Intercept = intercept; EstMdl = estimate(Mdl,logGDP,'X',cpi,'Display','off')
EstMdl = regARIMA with properties: Description: "ARIMA(1,1,1) Error Model (Gaussian Distribution)" Distribution: Name = "Gaussian" Intercept: 5.80142 Beta: [0.00396698] P: 2 D: 1 Q: 1 AR: {0.922709} at lag [1] SAR: {} MA: {-0.387843} at lag [1] SMA: {} Variance: 0.000108943 Regression with ARIMA(1,1,1) Error Model (Gaussian Distribution)
Forecast GDP over a 15-quarter horizon. Use the estimation sample as a presample for the forecast.
[gdpF,gdpMSE] = forecast(EstMdl,15,'Y0',logGDP,... 'X0',cpi,'XF',hoCPI);
Plot the forecasts and 95% forecast intervals.
figure h1 = plot(dt(end-65:end),log(DataTimeTable.GDP(end-65:end)),... 'Color',[.7,.7,.7]); hold on h2 = plot(dt(frstHzn),gdpF,'b','LineWidth',2); h3 = plot(dt(frstHzn),gdpF+1.96*sqrt(gdpMSE),'r:',... 'LineWidth',2); plot(dt(frstHzn),gdpF-1.96*sqrt(gdpMSE),'r:',... 'LineWidth',2); ha = gca; title('{\bf Log GDP Forecasts and Approximate 95% Intervals}') ph = patch([repmat(dt(frstHzn(1)),1,2) repmat(dt(frstHzn(end)),1,2)],... [ha.YLim fliplr(ha.YLim)],... [0 0 0 0],'b'); ph.FaceAlpha = 0.1; legend([h1 h2 h3],{'Observed GDP','Forecasted GDP',... '95% Forecast Interval'},'Location','Best','AutoUpdate','off'); axis tight hold off
The unconditional disturbances, , are nonstationary, therefore the widths of the forecast intervals grow with time.
More About
Time Base Partitions for Forecasting
Time base partitions for forecasting are
two disjoint, contiguous intervals of the time base; each interval contains time
series data for forecasting a dynamic model. The forecast
period (forecast horizon) is a numperiods
length partition at the end of the time base during which
forecast
generates forecasts Y
from
the dynamic model Mdl
. The presample
period is the entire partition occurring before the forecast period.
forecast
can require observed responses
Y0
, regression data X0
, unconditional
disturbances U0
, or innovations E0
in the
presample period to initialize the dynamic model for forecasting. The model
structure determines the types and amounts of required presample
observations.
A common practice is to fit a dynamic model to a portion of the data set, then
validate the predictability of the model by comparing its forecasts to observed
responses. During forecasting, the presample period contains the data to which the
model is fit, and the forecast period contains the holdout sample for validation.
Suppose that yt is an observed response
series; x1,t,
x2,t, and
x3,t are observed
exogenous series; and time t = 1,…,T. Consider
forecasting responses from a dynamic model of
yt containing a regression component
numperiods
= K periods. Suppose that the
dynamic model is fit to the data in the interval [1,T –
K] (for more details, see estimate
). This figure shows the time base partitions for
forecasting.
For example, to generate forecasts Y
from a regression model
with AR(2) errors, forecast
requires presample unconditional
disturbances U0
and future predictor data XF
.
forecast
infers unconditional disturbances given enough readily available presample responses and predictor data. To initialize an AR(2) error model,Y0
= andX0
= .To model,
forecast
requires future exogenous dataXF
= .
This figure shows the arrays of required observations for the general case, with corresponding input and output arguments.
Algorithms
forecast
computes the forecasted response MSEs,YMSE
, by treating the predictor data matrices (X0
andXF
) as nonstochastic and statistically independent of the model innovations. Therefore,YMSE
reflects the variance associated with the unconditional disturbances of the ARIMA error model alone.forecast
usesY0
andX0
to inferU0
. Therefore, if you specifyU0
,forecast
ignoresY0
andX0
.
References
[1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
[2] Davidson, R., and J. G. MacKinnon. Econometric Theory and Methods. Oxford, UK: Oxford University Press, 2004.
[3] Enders, W. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, Inc., 1995.
[4] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
[5] Pankratz, A. Forecasting with Dynamic Regression Models. John Wiley & Sons, Inc., 1991.
[6] Tsay, R. S. Analysis of Financial Time Series. 2nd ed. Hoboken, NJ: John Wiley & Sons, Inc., 2005.
Abrir ejemplo
Tiene una versión modificada de este ejemplo. ¿Desea abrir este ejemplo con sus modificaciones?
Comando de MATLAB
Ha hecho clic en un enlace que corresponde a este comando de MATLAB:
Ejecute el comando introduciéndolo en la ventana de comandos de MATLAB. Los navegadores web no admiten comandos de MATLAB.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)