Why is training set worse than validation and testing for pattern recognition network performance?

Question

Jennifer Hammelman el 6 de Nov. de 2015

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/253574-why-is-training-set-worse-than-validation-and-testing-for-pattern-recognition-network-performance

Editada: Jennifer Hammelman el 13 de Nov. de 2015

I tested a pattern recognition neural network (using resilient backpropagation) with 2 hidden layers [5 4] on the iris dataset, and the mean-squared error of the network on the training data appears to be worse than test or validation. I found the same result with the Levenberg-Marquardt training algorithm. Does anyone have any ideas what would cause this?

edit: this does not happen every time that I run the code, but out of 100 trials it happened more often than not that the test, validation, or train (or two of the three) was separated from the rest.

load('iris_dataset')
%use default parameters
net = patternnet([5 4], 'trainrp');
net.trainParam.showWindow=0;
[net, tr] = train(net, irisInputs, irisTargets);
plotperform(tr);

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Greg Heath el 11 de Nov. de 2015

1
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/253574-why-is-training-set-worse-than-validation-and-testing-for-pattern-recognition-network-performance#answer_199335

Abrir en MATLAB Online

I have used your choice of [ H1 H2 ] = [5 4] with TRAINRP and obtained behavior similar to what you have described. Although my results for MSE summary statistics are tabulated below, the phenomenon is better understood by looking at the plots. In particular, the plots definitively show how the higher variance nontraining performance can occasionally exceed the training performance.

However, patternnet is a classification algorithm with defaults of H =10, TRAINSCG and CROSSENTROPY. Therefore, I also ran the default configuration.

Code and results are below:

 close all, clear all, clc
default = 0
[ x t ] = iris_dataset;
[ I N ] = size(x) % [ 4 150 ]
[ O N ] = size(t) % [ 3 150 ]
% No. of training Equations
Ntst   =  round(0.15*N), Nval = Ntst
Ntrn   = N-(Nval+Ntst) % trn/val/tst =104/23/23
Ntrneq = Ntrn*O  % 312
% Number of hidden nodes & unknown weights
if default == 0
    H1 = 5, H2 = 4      % Jennifer
    net = patternnet( [ H1 H2 ], 'trainrp');
    Nw = (I+1)*H1+(H1+1)*H2+(H2+1)*O % 64
else % default=1
    net = patternnet;
    H=10,  Nw = (I+1)*H+(H+1)*O % 83
end
view(net)
% net.trainParam.showWindow = 0;
vart    = mean(var(t',1)) % 22222 Reference MSE
Ntrials = 100
rng(4151941)
for i = 1:Ntrials
    net           = configure(net,x, t);
    [net tr y e ] = train(net, x, t ); 
    trnind  = tr.trainInd; ttrn = t(trnind); 
    valind  = tr.valInd;   tval = t(valind);
    tstind  = tr.testInd;  ttst = t(tstind);  
    varttrn = mean(var(ttrn',1)) ;  
    vartval = mean(var(tval',1)) ;  
    varttst = mean(var(ttst',1)) ; 
    Rsq(i,1)    = 1- mse(e)/vart;
    Rsqtrn(i,1) = 1- mse(e(:,trnind))/varttrn;
    Rsqval(i,1) = 1- mse(e(:,valind))/vartval;
    Rsqtst(i,1) = 1- mse(e(:,tstind))/varttst;
end
result      =  [Rsq Rsqtrn Rsqval Rsqtst ]
maxresult   =  max(result)
medresult   =  median(result)
meanresult  =  mean(result)
minresult   =  min(result)
stdresult   =  std(result)
 if default == 0
% result     = [ Rsq     Rsqtrn   Rsqval  Rsqtst ]
% maxresult  =   0.98      1         1     0.99998
% medresult  =   0.937    0.949    0.939   0.882
% meanresult =   0.932    0.945    0.917   0.873
% minresult  =   0.845    0.864    0.563   0.549
% stdresult  =   0.027    0.029    0.091   0.113
 else  %(default = 1)
% result      = [ Rsq    Rsqtrn   Rsqval   Rsqtst ]
% maxresult   =   0.98      1        1        1
% medresult   =   0.954  0.95634  0.94695  0.9227
% meanresult  =   0.949  0.95838  0.93826  0.9070
% minresult   =   0.804  0.84454  0.78548  0.5435
% stdresult   =   0.021  0.02421  0.05878  0.0935
end
 figure
subplot(221), hold on
plot( Rsq,    'k', 'LineWidth',2)
plot( Rsqtrn, 'b', 'LineWidth',2)
ylim( [ 0.5 1 ])
title( [ 'Rsq(black) & Rsqtrn(blue)' ] )
subplot(222), hold on
plot( Rsq,    'k', 'LineWidth',2)
plot( Rsqval, 'r', 'LineWidth',2)
ylim( [ 0.5 1 ])
title( [ 'Rsq(black) & Rsqval(red)' ])
subplot(223), hold on
plot( Rsq,    'k', 'LineWidth',2)
plot( Rsqtst, 'g', 'LineWidth',2)
ylim( [ 0.5 1 ])
title( [ 'Rsq(black) & Rsqtst(green)' ])
subplot(224), hold on
plot( Rsq,    'k', 'LineWidth',2)
plot( Rsqtrn, 'b', 'LineWidth',2)
plot( Rsqval, 'r', 'LineWidth',2)
plot( Rsqtst, 'g', 'LineWidth',2)
ylim( [ 0.5 1 ] )
title( [ 'Rsq, Rsqtrn, Rsqval & Rsqtst' ])

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Why is training set worse than validation and testing for pattern recognition network performance?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

Why is training set worse than validation and testing for pattern recognition network performance?

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos