Reproducibility in neural network

4 visualizaciones (últimos 30 días)
Richard
Richard el 8 de En. de 2016
Editada: Greg Heath el 9 de En. de 2016
I'm trying to breakdown the MATLAB neural network GUI by working out what each feature does. I'm keeping it simple by using the default training method (scg), and the MATLAB wine dataset for training/testing. For the time being, and for experimentation, I've removed the validation dataset, and I've set the NN up with 50 hidden nodes.
What I can't work out is why the results it produces are exactly the same each time. It takes exactly the same amount of epochs to get to the minimum gradient, performance and gradient values are exactly the same, and the results produced in the confusion matrix are exactly the same. The only thing I can think of is that the data splitting and initialisation of weights are not randomised, but everywhere I look online suggests that (by default) MATLAB does indeed randomise those parameters.
What am I missing? Are the weights and datasets not randomised after all? Code being used is below.
% Load MATLAB default wine dataset.
[x1,t1] = wine_dataset;
% Create net, 50 hidden nodes.
net = patternnet(50);
% Split the data into a 75% training and 25% testing group. Validation
% removed.
net.divideParam.trainRatio = 3/4;
net.divideParam.valRatio = 0;
net.divideParam.testRatio = 1/4;
% Train the data.
train(net,x1,t1);

Respuesta aceptada

Greg Heath
Greg Heath el 9 de En. de 2016
Editada: Greg Heath el 9 de En. de 2016
YIKES!!! You have entered the creepy world of
(TRUMPETS PLEASE!)
OVERTRAINING AN OVERFIT NET!!!
You can prevent the overtraining by
1. Using a validation set. Look at the performance plot
and see the drastic log-scale difference in performance
between the training and testing subset performances.
2. Using regularization. With regression this means
replacing the performance function MSE with MSEREG
which is something like
MSEREG = MSE + lambda * norm(weights)
Therefore, if you use large weights or , more likely, too many weights due to too many hidden nodes, training will be terminated earlier.
However with classification, using patternnet, the default performance measure is CROSSENTROPY. I am not sure if this is MATLAB compatible with regularization.
3. Use the Bayesian Regularization training function TRAINBR which by default, uses Nval = 0 and a form of MSEREG. HOWEVER, I'm not sure if this is MATLAB compatible with CROSSENTROPY.
4. Instead of preventing overtraining, you can prevent overfitting by just using fewer hidden nodes:
[x t] = wine_dataset;
[ I N ] = size(x) % [13 178 ]
[O N ] = size(t) % [ 3 178 ]
vart = mean(var(t',1))% 0.21944
Ntst = round(0.25*N) % 45
Ntrn = N-Ntst % 133
Ntrneq = Ntrn*O % 399 training equations
5. When the net is configured with H = 50 hidden nodes, the number of unknown weights will be
Nw = (I+1)*H+(H+1)*O % 853 unknown weights
which is more than twice the number of training equations !!!
==> OVERFITTING!
H = 50
net = patternnet(H);
Nw = net.numWeightElements % 50 when unconfigured
net = configure(net,x,t);
Nw = net.numWeightElements % 853 when configured
Note: Training will automatically configure an unconfigured net
To avoid overfitting
Nw <= Ntrneq <==> H <= Hub
Hub = (Ntrneq-O)/(I+O+1) % 23.294
Therefore H <= 23 avoids overfitting
net.divideParam.testratio = 3/4;
net.divideParam.valratio = 0;
net.divideParam.testratio = 1/4;
[ net tr y e ] = train(net,x,t);
% y = net(x); e = t-y % error
NMSE = mse(e)/vart % 0.017875
Rsq = 1- NMSE % 0.98213
Therefore, the net models 98.2% of the average target variance.
However, the net is overfitted. Therefore, the difference between the test and training performances is very important.
Moreover, the net is a classifier. Therefore, the difference between the training and test performances in terms of CROSSENTROPY and CLASSIFICATION RATE is more important!
indtrn = tr.trainInd;
indval = tr.valInd % Empty matrix: 1-by-0
indtst = tr.testInd;
TO BE CONTINUED
Hope this helps.
Thank you for formally accepting my answer
Greg

Más respuestas (1)

Star Strider
Star Strider el 8 de En. de 2016
‘The only thing I can think of is that the data splitting and initialisation of weights are not randomised, but everywhere I look online suggests that (by default) MATLAB does indeed randomise those parameters.’
You likely solved this yourself. I couldn’t find it in the current documentation (and I opted not to hack the GUI), but it is possible to set the random number generator using the rng function so that the seed (and the subsequent outputs) are always the same. MATLAB does this frequently in its documentation sample code to force results that are the same as the demonstration code.
  2 comentarios
Richard
Richard el 8 de En. de 2016
Thanks. I originally found the wine data through MATLAB's sample code, but I believe they included:
setdemorandstream(somenumber)
Instead of rng, which I've obviously omitted as I don't want to set the random stream. I think it would be very strange for MATLAB to include a fixed random variable within their ANN code/GUI.
Star Strider
Star Strider el 8 de En. de 2016
I haven’t used the Neural Network Toolbox in a while, but it could be that the data are sufficiently well characterised that every net of similar architecture would produce the same results, regardless of the random number generator seed. You would probably have to test the data with a different classifier (perhaps a k-th nearest neighbor classifier) to see if that is the situation.

Iniciar sesión para comentar.

Productos

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by