Why is training set worse than validation and testing for pattern recognition network performance?

1 visualización (últimos 30 días)
I tested a pattern recognition neural network (using resilient backpropagation) with 2 hidden layers [5 4] on the iris dataset, and the mean-squared error of the network on the training data appears to be worse than test or validation. I found the same result with the Levenberg-Marquardt training algorithm. Does anyone have any ideas what would cause this?
edit: this does not happen every time that I run the code, but out of 100 trials it happened more often than not that the test, validation, or train (or two of the three) was separated from the rest.
load('iris_dataset')
%use default parameters
net = patternnet([5 4], 'trainrp');
net.trainParam.showWindow=0;
[net, tr] = train(net, irisInputs, irisTargets);
plotperform(tr);

Respuesta aceptada

Greg Heath
Greg Heath el 11 de Nov. de 2015
I have used your choice of [ H1 H2 ] = [5 4] with TRAINRP and obtained behavior similar to what you have described. Although my results for MSE summary statistics are tabulated below, the phenomenon is better understood by looking at the plots. In particular, the plots definitively show how the higher variance nontraining performance can occasionally exceed the training performance.
However, patternnet is a classification algorithm with defaults of H =10, TRAINSCG and CROSSENTROPY. Therefore, I also ran the default configuration.
Code and results are below:
close all, clear all, clc
default = 0
[ x t ] = iris_dataset;
[ I N ] = size(x) % [ 4 150 ]
[ O N ] = size(t) % [ 3 150 ]
% No. of training Equations
Ntst = round(0.15*N), Nval = Ntst
Ntrn = N-(Nval+Ntst) % trn/val/tst =104/23/23
Ntrneq = Ntrn*O % 312
% Number of hidden nodes & unknown weights
if default == 0
H1 = 5, H2 = 4 % Jennifer
net = patternnet( [ H1 H2 ], 'trainrp');
Nw = (I+1)*H1+(H1+1)*H2+(H2+1)*O % 64
else % default=1
net = patternnet;
H=10, Nw = (I+1)*H+(H+1)*O % 83
end
view(net)
% net.trainParam.showWindow = 0;
vart = mean(var(t',1)) % 22222 Reference MSE
Ntrials = 100
rng(4151941)
for i = 1:Ntrials
net = configure(net,x, t);
[net tr y e ] = train(net, x, t );
trnind = tr.trainInd; ttrn = t(trnind);
valind = tr.valInd; tval = t(valind);
tstind = tr.testInd; ttst = t(tstind);
varttrn = mean(var(ttrn',1)) ;
vartval = mean(var(tval',1)) ;
varttst = mean(var(ttst',1)) ;
Rsq(i,1) = 1- mse(e)/vart;
Rsqtrn(i,1) = 1- mse(e(:,trnind))/varttrn;
Rsqval(i,1) = 1- mse(e(:,valind))/vartval;
Rsqtst(i,1) = 1- mse(e(:,tstind))/varttst;
end
result = [Rsq Rsqtrn Rsqval Rsqtst ]
maxresult = max(result)
medresult = median(result)
meanresult = mean(result)
minresult = min(result)
stdresult = std(result)
if default == 0
% result = [ Rsq Rsqtrn Rsqval Rsqtst ]
% maxresult = 0.98 1 1 0.99998
% medresult = 0.937 0.949 0.939 0.882
% meanresult = 0.932 0.945 0.917 0.873
% minresult = 0.845 0.864 0.563 0.549
% stdresult = 0.027 0.029 0.091 0.113
else %(default = 1)
% result = [ Rsq Rsqtrn Rsqval Rsqtst ]
% maxresult = 0.98 1 1 1
% medresult = 0.954 0.95634 0.94695 0.9227
% meanresult = 0.949 0.95838 0.93826 0.9070
% minresult = 0.804 0.84454 0.78548 0.5435
% stdresult = 0.021 0.02421 0.05878 0.0935
end
figure
subplot(221), hold on
plot( Rsq, 'k', 'LineWidth',2)
plot( Rsqtrn, 'b', 'LineWidth',2)
ylim( [ 0.5 1 ])
title( [ 'Rsq(black) & Rsqtrn(blue)' ] )
subplot(222), hold on
plot( Rsq, 'k', 'LineWidth',2)
plot( Rsqval, 'r', 'LineWidth',2)
ylim( [ 0.5 1 ])
title( [ 'Rsq(black) & Rsqval(red)' ])
subplot(223), hold on
plot( Rsq, 'k', 'LineWidth',2)
plot( Rsqtst, 'g', 'LineWidth',2)
ylim( [ 0.5 1 ])
title( [ 'Rsq(black) & Rsqtst(green)' ])
subplot(224), hold on
plot( Rsq, 'k', 'LineWidth',2)
plot( Rsqtrn, 'b', 'LineWidth',2)
plot( Rsqval, 'r', 'LineWidth',2)
plot( Rsqtst, 'g', 'LineWidth',2)
ylim( [ 0.5 1 ] )
title( [ 'Rsq, Rsqtrn, Rsqval & Rsqtst' ])

Más respuestas (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by