Need help with complicated loop to create several different models

2 visualizaciones (últimos 30 días)
Hey everyone,
I am creating an AR model and I want to calculate the AIC and BIC for each different possible configuration of the model. Over all 24 lags are possible. So I want to calculate the AIC and BIC for all included lags starting with ARLag = 1, ARLag=2, ARLag=3 .... ARLag=24, I also want the AIC and BIC for all models as ARLag=1:2, ARLag=1:3...., ARLag 1:24... ok and now the really complicated part: I want the AIC and BIC for all combinations of Lags like ARLag=[1 3], ARLag=[1 4]...ARLag=[1 24], ARLag=[1 2 4], ... ARLag=[1 2 24] etc. So basically I want to know the AIC and BIC for each possible combination of lags so that I can choose the model with the minimal AIC and BIC!
Does anyone have an idea how I could do this??

Respuesta aceptada

Roger Wohlwend
Roger Wohlwend el 4 de Nov. de 2014
Don't do it the complicated way. Use the AIC and the BIC only to find the model order, i.e. the highest lag of the model.
for p = 1 : 24
% estimate a Model with order p
ARLag = 1 : p;
next p
Calculate the AIC and the BIC and then choose the model order p. Afterwards do the fine tuning. Estimate the AR(p) model and exclude lags if the coefficients are not significant. Re-estimate the model until all coefficients are significant. The AIC and BIC are not needed to find out if the lags 1 to p-1 are part of the model.
And don't forget to check the residuals. If they exhibit autocorrelation the t-statistics is not valid and your model may not be appropriate.
  1 comentario
MC3105
MC3105 el 4 de Nov. de 2014
Thank you for your answer! I followed through and have another question i would like to ask you. I am creating the AR model to forecast electricity generation by photovoltaic plants. To identify the right lags for the model I have the generation data of three sample photovoltaic plants. Later I want to apply my model to over 300 photovoltaic plants.
So I calculated AIC and BIC to determine the model order p for each of my three datasets. It is 23 every time. Then I excluded lags that were not significant for each dataset individually. Now I am left with different significant lags for each of my three sample photovoltaic plants:
Plant 1: Significant Lags are 1, 2, 3, 4, 23
Plant 2: Significant Lags are 1, 2, 4, 23
Plant 3: Significant Lags are 1, 2, 3, 4, 20, 23
I've got two options now: Either I can include ALL of these lags in my final model, or I can only include lags that are significant for ALL of my three sample plants.
I would be very grateful to get your opinion on this matter.
Can you also explain to me what you mean by checking the residuals? Do I check them at the end? Or every time I eliminate an insignificant lag to see how that affects my autocorrelation function? Do I also have to check the partial autocorrelation?

Iniciar sesión para comentar.

Más respuestas (1)

Roger Wohlwend
Roger Wohlwend el 5 de Nov. de 2014
Hm ... If you want a model for all plants then the best solution is to leave all lags in the model.
If you have time you could estimate models for all 300 plants, not just for the 3 sample plants. That sounds like a lot of work, but it is not. Create a loop where you do the calcualtion for all plants. In such a way you could verify if the highest lag is 23 for all plants, and you could check if using one model for all plants is really a good idea.
Yes, you have to check the residuals at the end. Find out if they exhibit autocorrelation. They should not. Just check teh autocrrolation function or use the function lbqtest. No, you don't need the partial autocorrelation function.

Categorías

Más información sobre Power and Energy Systems en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by