Borrar filtros
Borrar filtros

Why the results are different by using trainNetwork and custom training loop?

4 visualizaciones (últimos 30 días)
I have defined a custom layer, and construct a simple network. when I train the network by using trainNetwork and custom training loop, the results are different. the parameters and data are the same.The codes are follows:
1.this is the network trained by trainNetwork function
clear
clc
rng(0)
%% parameters
nMFs = 32;
init_method = 'linespace';
%% data
dataname = 'house';
load([dataname,'.mat'])
% data = xx;
data=data(all(~isnan(data),2),:); % 去除缺失值
data = removeconstantrows(data')'; % 去除常数特征
%% data process
X=data(:,1:end-1); y=data(:,end); y=y./1e5;%y=y-mean(y);
% X=zscore(X);
X=2*(X-min(X))./(max(X)-min(X))-1;
[N0,M]=size(X);
N=round(N0*.7);
idsTrain=datasample(1:N0,N,'replace',false);
XTrain=X(idsTrain,:); yTrain=y(idsTrain);
XTest=X; XTest(idsTrain,:)=[];%XTest={XTest};
yTest=y; yTest(idsTrain)=[];%yTest={yTest};
%% rule list
nRules = nMFs;
% rule = comb(repmat(1:nMFs,M,1));
% rule = repmat([1:nMFs]',1,M);
%% learnable parameters initial method
switch init_method
% FCM
case 'FCM'
[C0,U] = FuzzyCMeans(XTrain,nRules,[2 100 0.001 0]);
Sigma0=C0;
W0 = randn(nRules,M+1);
for ir=1:nRules
Sigma0(ir,:)=std(XTrain,U(ir,:));
W0(ir,1)=U(ir,:)*yTrain/sum(U(ir,:));
end
Sigma0(Sigma0==0)=mean(Sigma0(:));
case 'random'
% random
C0 = randn(nRules,M);
Sigma0 = rand(nRules,M);
W0 = randn(nRules,M+1);
Sigma0(Sigma0==0)=mean(Sigma0(:));
case 'linespace'
% linespace
C0=zeros(nMFs,M); Sigma0=C0; W0=zeros(nMFs,M+1);
for m=1:M % Initialization
C0(:,m)=linspace(min(XTrain(:,m)),max(XTrain(:,m)),nMFs);
Sigma0(:,m)=std(XTrain(:,m));
end
Sigma0=ones(nMFs,M);
end
%% layers
layers = [
featureInputLayer(M,'Name','Input','Normalization','none');
TSKlayer1(C0,Sigma0,W0,'TSK1');
regressionLayer];
options = trainingOptions(...
'adam',...
'GradientDecayFactor',0.9,...
'SquaredGradientDecayFactor',0.999,...
'Epsilon',1e-8,...
'MaxEpochs',50,...
'MiniBatchSize',128,...
'InitialLearnRate',0.01,...
'LearnRateSchedule','piecewise',...
'LearnRateDropPeriod',100,...
'LearnRateDropFactor',1,...
'Shuffle','every-epoch',...
'ValidationData',{XTest,yTest},...
'ValidationFrequency',10,...
'ValidationPatience',1000,...
'OutputNetwork','best-validation-loss',...
'L2Regularization',0,...
'ResetInputNormalization',false,...
'GradientThreshold',inf,...
'Plots','training-progress');
%% Train the nn
tic
[net,tinfo] = trainNetwork(XTrain,yTrain,layers,options);
toc
the results: the minimum RMSE is about 0.344
2.this is trained by custom training loop
clear
clc
rng(0)
%% parameters
nMFs = 32;
learnRate = 0.01;
decay = 1;
gradientDecayFactor = 0.9;
squaredGradientDecayFactor = 0.999;
epsilon = 1e-8;
numEpochs = 50;
miniBatchSize = 128;
init_method = 'linespace';
%% data
dataname = 'house';
load([dataname,'.mat'])
data=data(all(~isnan(data),2),:); % 去除缺失值
data = removeconstantrows(data')'; % remove constant features
%% data process
X=data(:,1:end-1); y=data(:,end); y=y./1e5;%y=y-mean(y);
X=2*(X-min(X))./(max(X)-min(X))-1;
[N0,M]=size(X);
N=round(N0*.7);
idsTrain=datasample(1:N0,N,'replace',false);
XTrain=X(idsTrain,:); yTrain=y(idsTrain);
XTest=X; XTest(idsTrain,:)=[];%XTest={XTest};
yTest=y; yTest(idsTrain)=[];%yTest={yTest};
XTest = dlarray(XTest','CB');
yTest = dlarray(yTest','CB');
%% rule list
nRules = nMFs;
% rule = comb(repmat(1:nMFs,M,1));
% rule = repmat([1:nMFs]',1,M); % not used
%% initial method
switch init_method
% FCM
case 'FCM'
[C0,U] = FuzzyCMeans(XTrain,nRules,[2 100 0.001 0]);
Sigma0=C0;
W0 = randn(nRules,M+1);
for ir=1:nRules
Sigma0(ir,:)=std(XTrain,U(ir,:));
W0(ir,1)=U(ir,:)*yTrain/sum(U(ir,:));
end
Sigma0(Sigma0==0)=mean(Sigma0(:));
case 'random'
% random
C0 = randn(nRules,M);
Sigma0 = rand(nRules,M);
W0 = randn(nRules,M+1);
Sigma0(Sigma0==0)=mean(Sigma0(:));
case 'linespace'
% linespace
C0=zeros(nMFs,M); Sigma0=C0; W0=zeros(nMFs,M+1);
for m=1:M % Initialization
C0(:,m)=linspace(min(XTrain(:,m)),max(XTrain(:,m)),nMFs);
% Sigma0(:,m)=std(XTrain(:,m));
end
Sigma0=ones(nMFs,M);
end
%% data format
dsXTrain = arrayDatastore(XTrain);
dsyTrain = arrayDatastore(yTrain);
dsTrain = combine(dsXTrain,dsyTrain);
layers = [
featureInputLayer(M,'Name','Input','Normalization','none');
TSKlayer1(C0,Sigma0,W0,'TSK1');
];
lgraph = layerGraph(layers);
dlnet = dlnetwork(lgraph);
plots = "training-progress";
% plots = "nan";
% Train Model
% Train the model using a custom training loop. Initialize the velocity parameter for the SGDM solver.
velocity = [];
% accfun = dlaccelerate(@modelGradients);
% clearCache(accfun)
%% mini batch
mbq = minibatchqueue(dsTrain,...
'MiniBatchSize',miniBatchSize,...
'MiniBatchFcn', @preprocessMiniBatch,...
'MiniBatchFormat',{'CB','CB'});
% Initialize the training progress plot.
if plots == "training-progress"
figure
lineLossTrain = animatedline('Color',[0.85 0.325 0.098]);
lineLossTest = animatedline('Color',[0 0 0]);
ylim([0 inf])
xlabel("Iteration")
ylabel("Loss")
grid on
end
averageGrad = [];
averageSqGrad = [];
iteration = 0;
start = tic;
% Loop over epochs.
for epoch = 1:numEpochs
learnRate = learnRate*decay;
% Shuffle data.
shuffle(mbq)
% Loop over mini-batches.
while hasdata(mbq)
iteration = iteration + 1;
% Read mini-batch of data.
[dlX1,dlY] = next(mbq);
% Evaluate the model gradients, state, and loss using dlfeval and the
% modelGradients function and update the network state.
[gradients,state,loss] = dlfeval(@modelGradients,dlnet,dlX1,dlY);
dlnet.State = state;
% Update the network parameters using the SGDM optimizer.
% [dlnet, velocity] = adamupdate(dlnet, gradients, velocity, learnRate, momentum);
[dlnet,averageGrad,averageSqGrad] = ...
adamupdate(dlnet, gradients, ...
averageGrad, averageSqGrad, iteration, ...
learnRate, gradientDecayFactor, squaredGradientDecayFactor,epsilon);
yPreVal = predict(dlnet,XTest);
% yPreVal(isnan(yPreVal)) = yTest(isnan(yPreVal));
test_error = sqrt(mse(yPreVal,yTest))
if plots == "training-progress"
% Display the training progress.
D = duration(0,0,toc(start),'Format','hh:mm:ss');
%completionPercentage = round(iteration/numIterations*100,0);
title("Epoch: " + epoch + ", Elapsed: " + string(D));
addpoints(lineLossTrain,iteration,double(gather(extractdata(sqrt(loss)))))
addpoints(lineLossTest,iteration,double(extractdata(test_error)))
drawnow limitrate
end
end
end
function [gradients,state,loss] = modelGradients(dlnet,dlX1,Y)
[dlYPred,state] = forward(dlnet,dlX1);
loss = mse(dlYPred,Y);
gradients = dlgradient(loss,dlnet.Learnables);
end
function [X,Y] = preprocessMiniBatch(XCell,YCell)
% Extract feature data from cell and concatenate.
X = cat(1,XCell{:});
X = X';
% Extract label data from cell and concatenate.
Y = cat(2,YCell{:});
end
and the results: the minimum RMSE is about 0.24
Why the results are so different?
the TSK1 layer is my custom layer with backword function.
  4 comentarios
Samuel Somuyiwa
Samuel Somuyiwa el 2 de Mzo. de 2022
The RMSE in the training plot of trainNetwork does not include the factor of half, whereas in the custom training loop you used sqrt(mse(x,y)), and mse includes the factor of half. l2loss does not include the factor of half, so it should be the right function to use in this case. Did you try sqrt(l2loss(x,y))?.
Rik
Rik el 3 de Mzo. de 2022
Comment posted as flag by @shuai ma:
they are the same now, thanks so much

Iniciar sesión para comentar.

Respuesta aceptada

Samuel Somuyiwa
Samuel Somuyiwa el 14 de Mzo. de 2022
The RMSE in the training plot of trainNetwork does not include the factor of half, whereas in the custom training loop you used sqrt(mse(x,y)), and mse includes the factor of half. l2loss does not include the factor of half, so the right way to compute RMSE in this case is sqrt(l2loss(x,y)).

Más respuestas (0)

Categorías

Más información sobre Image Data Workflows en Help Center y File Exchange.

Productos


Versión

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by