# How to implement cross validation in neural network for time series prediction

51 views (last 30 days)
BR on 23 Aug 2017
Edited: Greg Heath on 19 Jan 2018
I am using k fold cross validation for the training neural network in order to predict a time series. I have an input time series and I am using Nonlinear Autoregressive Tool for time series. I am using 10 fold cross validation method and divide the data set as 70 % training, 15% validation and 15 % testing. But I really din't know how to generate the code.
And please to be honest, this is the first time that I am using neural networks. So, please be humble in your explanation!!
This is something that I wrote,
k=10;
Indices=crossvalind('Kfold', length(X), 10);
X = tonndata(densig,true,false);
T = tonndata(densig,true,false);
trainFcn = 'trainlm';
inputDelays = 1:2;
feedbackDelays = 1:2;
hiddenLayerSize = [50 20 20];
X1=cell2mat(X);
T1=cell2mat(T);
for i=1:k
net = narxnet(inputDelays,feedbackDelays,hiddenLayerSize,'open',trainFcn);
X1(i)=find(X1(Indices(i)));
T1(i)=find(T1(Indices(i)));
[x,xi,ai,t] = preparets(net,X1,{},T1);
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
net.trainParam.epochs = 5000;
[net,tr] = train(net,x,t,xi,ai);
end
y = net(x,xi,ai);
e = gsubtract(t,y);
performance = perform(net,t,y);
Thanks Baqar
Greg Heath on 23 Aug 2017
The quickest way to get NN help is to run your program on one or more of the MATLAB examples from
doc nndatasets
and/or
help nndatasets
after initializing using
rng('default') % same as rng(0).
Hope this helps.
Greg

Greg Heath on 23 Aug 2017
Edited: Greg Heath on 24 Aug 2017
UH-OH ! I do not have crossvalind.
CROSSVALIND IS NOT IN THE NN TOOLBOX!!!
However, I have posted crossvalidation results in both the NEWSGROUP and ANSWERS.
Your problem is doubly troubling because there are very few references that use cross-validation with
EITHER NNs OR TIMESERIES !!!
My search yields the following number of hits:
NEURAL 4319 5130
TIMESERIES 604 1696
NEURAL TIMESERIES 87 344
CROSSVAL 51 119
CROSSVAL NEURAL 9 19
CROSSVAL TIMESERIES 0 3
The main reasons for so few examples is that
1. It IS VERY MUCH EASIER AND NO LESS VALID to design NNs with
multiple random data divisions.
2. TIMESERIES REQUIRE CONSTANT TIMESTEPS. However, the number
of relevant arrangements is severely limited.
3. The best way to get many design variations is merely to use
many trials with random initial weights.
Hope this helps.
Thank you for formally accepting my answer
Greg
Greg Heath on 25 Aug 2017
I don't think you understand:
Use a MATLAB example dataset and initialize the rng to the zero state so that we can compare our results with yours.
Greg

Greg Heath on 31 Dec 2017
If this is the 1st time you are using neural networks:
1. BOTH TIMESERIES AND CROSSVALIDATION ARE ADVANCED TOPICS. IF YOU HAVE A CHOICE, START WITH ELEMENTARY TOPICS
a. Regression/Function-Fitting
help fitnet
doc fitnet
b. Classification/Target-Identification
help patternnet
doc patternnet
c. Non-feedback Timeseries
help time-delaynet
doc time-delaynet
d. Feedback Timeseries
help narxnet
doc narxnet
2. I don't recommend crossvalidation for neural networks.
a. Multiple random weight intializations for each of a specified number of hidden nodes in a single hidden layer net tends to be sufficient and order of magnitudes faster.
b. The goal is to minimize the number the number of hidden nodes subject to an upper limit on meansquareerror (or crossentropy for classification)
Hope this helps.
Greg

orlem lima dos santos on 19 Jan 2018
Hi again, I do not recommend using standard cross-validation (crossval function) to time series prediction for this type of case there is a technique known as "time series cross-validation" (https://robjhyndman.com/hyndsight/tscv/) Unfortunately there is not a function implemented in matlab, but there is one in python scikit-learn (<http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html>) that can help.

Greg Heath on 19 Jan 2018
Edited: Greg Heath on 19 Jan 2018
If you have to maintain the original spacing, one way to use f-fold XVAL in time series is illustrated below for f = 10
1. Divide the data into 10 blocks [ B1 B2 ... B10 ]
2. for i= 1: 10, test on Bi, train on the rest.
3. For example, if i =5,
a. Train on B1 to B4 using B1 for initial conditions
b. Continue training on B6 to B10 using B6 (NOT B4 !) for initial conditions
c. Compute separate SSEs for B5 and ~B5
4. Combine the i=1:10 SSEs for 2 separate results MSEtrn and MSEtst
5. To obtain a production series, you can test each on all of the data
and combine them any way you choose (e.g., best, weighted average, ...)
Hope this helps.
Thank you for formally accepting my answer
Greg
P.S. I favor the normalized MSE,
NMSE = MSE/mean(var(target',1))
which is normally in the range 0 <= NMSE <= 1 and related to the statistical Rsquare (See Wikipedia)
Rsquare = 1 - NMSE