Reproducibility in neural network

Question

Richard el 8 de En. de 2016

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/263157-reproducibility-in-neural-network

Editada: Greg Heath el 9 de En. de 2016

I'm trying to breakdown the MATLAB neural network GUI by working out what each feature does. I'm keeping it simple by using the default training method (scg), and the MATLAB wine dataset for training/testing. For the time being, and for experimentation, I've removed the validation dataset, and I've set the NN up with 50 hidden nodes.

What I can't work out is why the results it produces are exactly the same each time. It takes exactly the same amount of epochs to get to the minimum gradient, performance and gradient values are exactly the same, and the results produced in the confusion matrix are exactly the same. The only thing I can think of is that the data splitting and initialisation of weights are not randomised, but everywhere I look online suggests that (by default) MATLAB does indeed randomise those parameters.

What am I missing? Are the weights and datasets not randomised after all? Code being used is below.

% Load MATLAB default wine dataset.
[x1,t1] = wine_dataset;
% Create net, 50 hidden nodes.
net = patternnet(50);
% Split the data into a 75% training and 25% testing group. Validation
% removed.
net.divideParam.trainRatio = 3/4;
net.divideParam.valRatio = 0;
net.divideParam.testRatio = 1/4;
% Train the data.
train(net,x1,t1);

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Greg Heath el 9 de En. de 2016

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/263157-reproducibility-in-neural-network#answer_205612

Editada: Greg Heath el 9 de En. de 2016

YIKES!!! You have entered the creepy world of

 (TRUMPETS PLEASE!)
             OVERTRAINING AN OVERFIT NET!!!

You can prevent the overtraining by

1. Using a validation set. Look at the performance plot
and see the drastic log-scale difference in performance 
between the training and testing subset performances.
 2. Using regularization. With regression this means 
 replacing the performance function MSE with MSEREG 
 which is something like
 MSEREG = MSE + lambda * norm(weights)

Therefore, if you use large weights or , more likely, too many weights due to too many hidden nodes, training will be terminated earlier.

However with classification, using patternnet, the default performance measure is CROSSENTROPY. I am not sure if this is MATLAB compatible with regularization.

3. Use the Bayesian Regularization training function TRAINBR which by default, uses Nval = 0 and a form of MSEREG. HOWEVER, I'm not sure if this is MATLAB compatible with CROSSENTROPY.

4. Instead of preventing overtraining, you can prevent overfitting by just using fewer hidden nodes:

 [x  t]  = wine_dataset;
 [ I N ] = size(x)        % [13 178 ]
 [O N ]  = size(t)        % [ 3 178 ]
 vart    = mean(var(t',1))% 0.21944
  Ntst   = round(0.25*N)  % 45
  Ntrn   = N-Ntst         % 133
  Ntrneq = Ntrn*O         % 399 training equations

5. When the net is configured with H = 50 hidden nodes, the number of unknown weights will be

Nw = (I+1)*H+(H+1)*O % 853 unknown weights

which is more than twice the number of training equations !!!

 ==> OVERFITTING! 
H   = 50
net = patternnet(H);
Nw  = net.numWeightElements % 50 when unconfigured
net = configure(net,x,t);
Nw  = net.numWeightElements % 853 when configured

Note: Training will automatically configure an unconfigured net

To avoid overfitting

 Nw <= Ntrneq <==> H <= Hub
 Hub = (Ntrneq-O)/(I+O+1) % 23.294

Therefore H <= 23 avoids overfitting

 net.divideParam.testratio = 3/4;
 net.divideParam.valratio  = 0;
 net.divideParam.testratio = 1/4;
 [ net tr y e ] = train(net,x,t);
 % y = net(x); e = t-y  % error
 NMSE = mse(e)/vart     % 0.017875
 Rsq  = 1- NMSE         %  0.98213

Therefore, the net models 98.2% of the average target variance.

However, the net is overfitted. Therefore, the difference between the test and training performances is very important.

Moreover, the net is a classifier. Therefore, the difference between the training and test performances in terms of CROSSENTROPY and CLASSIFICATION RATE is more important!

 indtrn = tr.trainInd;
 indval = tr.valInd   %   Empty matrix: 1-by-0
 indtst = tr.testInd;
 TO BE CONTINUED

Hope this helps.

Thank you for formally accepting my answer

Greg

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Answer 2

Star Strider el 8 de En. de 2016

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/263157-reproducibility-in-neural-network#answer_205571

‘The only thing I can think of is that the data splitting and initialisation of weights are not randomised, but everywhere I look online suggests that (by default) MATLAB does indeed randomise those parameters.’

You likely solved this yourself. I couldn’t find it in the current documentation (and I opted not to hack the GUI), but it is possible to set the random number generator using the rng function so that the seed (and the subsequent outputs) are always the same. MATLAB does this frequently in its documentation sample code to force results that are the same as the demonstration code.

2 comentarios
Mostrar NingunoOcultar Ninguno

Richard el 8 de En. de 2016

Thanks. I originally found the wine data through MATLAB's sample code, but I believe they included:

setdemorandstream(somenumber)

Instead of rng, which I've obviously omitted as I don't want to set the random stream. I think it would be very strange for MATLAB to include a fixed random variable within their ANN code/GUI.

Star Strider el 8 de En. de 2016

I haven’t used the Neural Network Toolbox in a while, but it could be that the data are sufficiently well characterised that every net of similar architecture would produce the same results, regardless of the random number generator seed. You would probably have to test the data with a different classifier (perhaps a k-th nearest neighbor classifier) to see if that is the situation.

Iniciar sesión para comentar.

Reproducibility in neural network

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (1)

2 comentarios
Mostrar NingunoOcultar Ninguno

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

Reproducibility in neural network

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (1)

2 comentarios Mostrar NingunoOcultar Ninguno

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

2 comentarios
Mostrar NingunoOcultar Ninguno