Why the results are different by using trainNetwork and custom training loop?

Question

shuai ma el 1 de Mzo. de 2022

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/1661160-why-the-results-are-different-by-using-trainnetwork-and-custom-training-loop

Respondida: Samuel Somuyiwa el 14 de Mzo. de 2022

I have defined a custom layer, and construct a simple network. when I train the network by using trainNetwork and custom training loop, the results are different. the parameters and data are the same.The codes are follows:

1.this is the network trained by trainNetwork function

clear
clc
rng(0)
%% parameters
nMFs = 32; 
init_method = 'linespace';
%% data
dataname = 'house';
load([dataname,'.mat'])
% data = xx;
data=data(all(~isnan(data),2),:);    % 去除缺失值
data = removeconstantrows(data')';   % 去除常数特征
%% data process
X=data(:,1:end-1); y=data(:,end); y=y./1e5;%y=y-mean(y);
% X=zscore(X);
X=2*(X-min(X))./(max(X)-min(X))-1;
[N0,M]=size(X);
N=round(N0*.7);
idsTrain=datasample(1:N0,N,'replace',false);
XTrain=X(idsTrain,:); yTrain=y(idsTrain);
XTest=X; XTest(idsTrain,:)=[];%XTest={XTest};
yTest=y; yTest(idsTrain)=[];%yTest={yTest};
%% rule list
nRules = nMFs;
% rule = comb(repmat(1:nMFs,M,1));
% rule = repmat([1:nMFs]',1,M);
%% learnable parameters initial method
switch init_method
    % FCM
    case 'FCM'
        [C0,U] = FuzzyCMeans(XTrain,nRules,[2 100 0.001 0]);
        Sigma0=C0;
        W0 = randn(nRules,M+1);
        for ir=1:nRules
            Sigma0(ir,:)=std(XTrain,U(ir,:));
            W0(ir,1)=U(ir,:)*yTrain/sum(U(ir,:));
        end
        Sigma0(Sigma0==0)=mean(Sigma0(:));
    case 'random'
        % random
        C0 = randn(nRules,M);
        Sigma0 = rand(nRules,M);
        W0 = randn(nRules,M+1);
        Sigma0(Sigma0==0)=mean(Sigma0(:));
    case 'linespace'
        % linespace
        C0=zeros(nMFs,M); Sigma0=C0; W0=zeros(nMFs,M+1);
        for m=1:M % Initialization
            C0(:,m)=linspace(min(XTrain(:,m)),max(XTrain(:,m)),nMFs);
            Sigma0(:,m)=std(XTrain(:,m));
        end
        Sigma0=ones(nMFs,M);
end
%% layers
layers = [
    featureInputLayer(M,'Name','Input','Normalization','none');
    TSKlayer1(C0,Sigma0,W0,'TSK1');
    regressionLayer];
options = trainingOptions(...
    'adam',...
    'GradientDecayFactor',0.9,...
    'SquaredGradientDecayFactor',0.999,...
    'Epsilon',1e-8,...
    'MaxEpochs',50,...
    'MiniBatchSize',128,...
    'InitialLearnRate',0.01,...
    'LearnRateSchedule','piecewise',...
    'LearnRateDropPeriod',100,...
    'LearnRateDropFactor',1,...
    'Shuffle','every-epoch',...
    'ValidationData',{XTest,yTest},...
    'ValidationFrequency',10,...
    'ValidationPatience',1000,...
    'OutputNetwork','best-validation-loss',...
    'L2Regularization',0,...
    'ResetInputNormalization',false,...
    'GradientThreshold',inf,...
    'Plots','training-progress');
%% Train the nn
tic
[net,tinfo] = trainNetwork(XTrain,yTrain,layers,options);
toc

the results: the minimum RMSE is about 0.344

2.this is trained by custom training loop

clear
clc
rng(0)
%% parameters
nMFs = 32; 
learnRate = 0.01;
decay = 1;
gradientDecayFactor = 0.9;
squaredGradientDecayFactor = 0.999;
epsilon = 1e-8;
numEpochs = 50;
miniBatchSize = 128;
init_method = 'linespace';
%% data
dataname = 'house';
load([dataname,'.mat'])
data=data(all(~isnan(data),2),:);    % 去除缺失值
data = removeconstantrows(data')';   % remove constant features
%% data process
X=data(:,1:end-1); y=data(:,end); y=y./1e5;%y=y-mean(y);
X=2*(X-min(X))./(max(X)-min(X))-1;
[N0,M]=size(X);
N=round(N0*.7);
idsTrain=datasample(1:N0,N,'replace',false);
XTrain=X(idsTrain,:); yTrain=y(idsTrain);
XTest=X; XTest(idsTrain,:)=[];%XTest={XTest};
yTest=y; yTest(idsTrain)=[];%yTest={yTest};
XTest = dlarray(XTest','CB');
yTest = dlarray(yTest','CB');
%% rule list
nRules = nMFs;
% rule = comb(repmat(1:nMFs,M,1));
% rule = repmat([1:nMFs]',1,M);     % not used
%% initial method
switch init_method
% FCM
    case 'FCM'
        [C0,U] = FuzzyCMeans(XTrain,nRules,[2 100 0.001 0]);
        Sigma0=C0;
        W0 = randn(nRules,M+1);
        for ir=1:nRules
            Sigma0(ir,:)=std(XTrain,U(ir,:));
            W0(ir,1)=U(ir,:)*yTrain/sum(U(ir,:));
        end
        Sigma0(Sigma0==0)=mean(Sigma0(:));
    case 'random'
        % random
        C0 = randn(nRules,M);
        Sigma0 = rand(nRules,M);
        W0 = randn(nRules,M+1);
        Sigma0(Sigma0==0)=mean(Sigma0(:));
    case 'linespace'
        % linespace
        C0=zeros(nMFs,M); Sigma0=C0; W0=zeros(nMFs,M+1);
        for m=1:M % Initialization
            C0(:,m)=linspace(min(XTrain(:,m)),max(XTrain(:,m)),nMFs);
%             Sigma0(:,m)=std(XTrain(:,m));
        end
        Sigma0=ones(nMFs,M);
end
%% data format
dsXTrain = arrayDatastore(XTrain);
dsyTrain = arrayDatastore(yTrain);
dsTrain = combine(dsXTrain,dsyTrain);
layers = [
    featureInputLayer(M,'Name','Input','Normalization','none');
    TSKlayer1(C0,Sigma0,W0,'TSK1');
    ];
lgraph = layerGraph(layers);
dlnet = dlnetwork(lgraph);
plots = "training-progress";
% plots = "nan";
% Train Model
% Train the model using a custom training loop. Initialize the velocity parameter for the SGDM solver.
velocity = [];
% accfun = dlaccelerate(@modelGradients);
% clearCache(accfun)
%% mini batch
mbq = minibatchqueue(dsTrain,...
    'MiniBatchSize',miniBatchSize,...
    'MiniBatchFcn', @preprocessMiniBatch,...
    'MiniBatchFormat',{'CB','CB'}); 
% Initialize the training progress plot.
if plots == "training-progress"
    figure
    lineLossTrain = animatedline('Color',[0.85 0.325 0.098]);
    lineLossTest = animatedline('Color',[0 0 0]);
    ylim([0 inf])
    xlabel("Iteration")
    ylabel("Loss")
    grid on
end
averageGrad = [];
averageSqGrad = [];
iteration = 0;
start = tic;
% Loop over epochs.
for epoch = 1:numEpochs
    learnRate = learnRate*decay;
    % Shuffle data.
    shuffle(mbq)
    % Loop over mini-batches.
    while hasdata(mbq)
        iteration = iteration + 1;
        % Read mini-batch of data.
        [dlX1,dlY] = next(mbq);
        % Evaluate the model gradients, state, and loss using dlfeval and the
        % modelGradients function and update the network state.
        [gradients,state,loss] = dlfeval(@modelGradients,dlnet,dlX1,dlY);
        dlnet.State = state;
        % Update the network parameters using the SGDM optimizer.
%         [dlnet, velocity] = adamupdate(dlnet, gradients, velocity, learnRate, momentum);
        [dlnet,averageGrad,averageSqGrad] = ...
            adamupdate(dlnet, gradients, ...
            averageGrad, averageSqGrad, iteration, ...
            learnRate, gradientDecayFactor, squaredGradientDecayFactor,epsilon);
        yPreVal = predict(dlnet,XTest);
%         yPreVal(isnan(yPreVal)) = yTest(isnan(yPreVal));
        test_error = sqrt(mse(yPreVal,yTest))
        if plots == "training-progress"
            % Display the training progress.
            D = duration(0,0,toc(start),'Format','hh:mm:ss');
            %completionPercentage = round(iteration/numIterations*100,0);
            title("Epoch: " + epoch + ", Elapsed: " + string(D));
            addpoints(lineLossTrain,iteration,double(gather(extractdata(sqrt(loss)))))
            addpoints(lineLossTest,iteration,double(extractdata(test_error)))
            drawnow limitrate    
        end
    end
end
function [gradients,state,loss] = modelGradients(dlnet,dlX1,Y)
[dlYPred,state] = forward(dlnet,dlX1);
loss = mse(dlYPred,Y);
gradients = dlgradient(loss,dlnet.Learnables);
end
function [X,Y] = preprocessMiniBatch(XCell,YCell)
% Extract  feature data from cell and concatenate.
X = cat(1,XCell{:});
X = X';
% Extract label data from cell and concatenate.
Y = cat(2,YCell{:});
end

and the results: the minimum RMSE is about 0.24

Why the results are so different?

the TSK1 layer is my custom layer with backword function.

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Samuel Somuyiwa el 2 de Mzo. de 2022

The RMSE in the training plot of trainNetwork does not include the factor of half, whereas in the custom training loop you used sqrt(mse(x,y)), and mse includes the factor of half. l2loss does not include the factor of half, so it should be the right function to use in this case. Did you try sqrt(l2loss(x,y))?.

Rik el 3 de Mzo. de 2022

Comment posted as flag by @shuai ma:

they are the same now, thanks so much

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Samuel Somuyiwa el 14 de Mzo. de 2022

1
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/1661160-why-the-results-are-different-by-using-trainnetwork-and-custom-training-loop#answer_917384

The RMSE in the training plot of trainNetwork does not include the factor of half, whereas in the custom training loop you used sqrt(mse(x,y)), and mse includes the factor of half. l2loss does not include the factor of half, so the right way to compute RMSE in this case is sqrt(l2loss(x,y)).

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Why the results are different by using trainNetwork and custom training loop?

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Respuesta aceptada

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Why the results are different by using trainNetwork and custom training loop?

4 comentarios Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Respuesta aceptada

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos