Weights and Biases Not Updating in Custom MATLAB dlnetwork Training Loop

6 visualizaciones (últimos 30 días)
SYED
SYED el 30 de Jun. de 2024
Editada: Ruth el 9 de Jul. de 2024
Hello MATLAB Community,
I am currently working on training a custom autoencoder network using MATLAB's dlnetwork framework. Despite setting up a manual training loop with gradient computation and parameter updates using adamupdate, I've observed that the weights and biases of the network do not change between iterations. Additionally, all biases remain zero throughout training. I am using Matlab R2024a. Here are the relevant parts of my code:
trailingAvgG = [];
trailingAvgSqG = [];
trailingAvgR = [];
trailingAvgSqR = [];
miniBatchSize = 600;
learnRate = 0.01;
layers = [
sequenceInputLayer(1,MinLength = 2048)
modwtLayer('Level',5,'IncludeLowpass',false,'SelectedLevels',2:5,"Wavelet","sym2")
flattenLayer
convolution1dLayer(128,8,Padding="same",Stride=8)
batchNormalizationLayer()
tanhLayer
maxPooling1dLayer(2,Padding="same")
convolution1dLayer(32,8,Padding="same",Stride=4)
batchNormalizationLayer
tanhLayer
maxPooling1dLayer(2,Padding="same")
transposedConv1dLayer(32,8,Cropping="same",Stride=4)
tanhLayer
transposedConv1dLayer(128,8,Cropping="same",Stride=8)
tanhLayer
bilstmLayer(8)
fullyConnectedLayer(8)
dropoutLayer(0.2)
fullyConnectedLayer(4)
dropoutLayer(0.2)
fullyConnectedLayer(1)];
net = dlnetwork(layers);
numEpochs = 200;
%dataMat = 1x2048x22275
dldata = arrayDatastore(dataMat,IterationDimension=3);
mbq = minibatchqueue(dldata,...
MiniBatchSize=miniBatchSize, ...
OutputEnvironment= "cpu");
iteration = 0;
for epoch = 1:numEpochs
shuffle(mbq);
while hasdata(mbq)
iteration = iteration+1;
[XTrain] = next(mbq);
XTrain = dlarray(XTrain,"TBC"); % 1(C)x600(B)x2048(T)
[datafromRNN,lossR] = RNN_model(XTrain,net);
[gradientsR] = dlfeval(@gradientFunction,mean(lossR), net);
[net,trailingAvgR,trailingAvgSqR] = adamupdate(net,gradientsR, ...
trailingAvgR,trailingAvgSqR,iteration,learnRate);
disp(['Iteration ', num2str(iteration), ', Loss: ', num2str(extractdata(lossR))]);
end
end
function [gradientsR] = gradientFunction(lossR, net)
gradientsR = dlgradient(lossR, net.Learnables);
end
function [datafromRNN,loss] = RNN_model(data,net)
z = data;
[coder, last] = forward(net, z, 'Outputs', {'maxpool1d_2', 'fc_3'});
loss = mse(last,z);
end
Questions:
  1. Why are the weights and biases not updating, and why do the biases remain zero?
  2. How can I ensure that the gradients computed are correct and being applied effectively?
  3. Are there any specific settings or modifications I should consider to resolve this issue?
Any insights or suggestions would be greatly appreciated!

Respuestas (2)

Umar
Umar el 30 de Jun. de 2024

Hi Syed,

To address the problem of weights and biases not updating during training, we need to ensure that the gradients are computed accurately and that the parameter updates are applied correctly. Let's make the necessary adjustments to the code:

% Initialize Adam optimizer parameters trailingAvgG = []; trailingAvgSqG = []; trailingAvgR = []; trailingAvgSqR = [];

% Define learning parameters miniBatchSize = 600; learnRate = 0.01;

% Define the neural network layers layers = [ % Your network layers here ];

net = dlnetwork(layers); numEpochs = 200;

% Assuming 'dataMat' is your input data dldata = arrayDatastore(dataMat, 'IterationDimension', 3); mbq = minibatchqueue(dldata, 'MiniBatchSize', miniBatchSize, 'OutputEnvironment', 'cpu');

iteration = 0; for epoch = 1:numEpochs shuffle(mbq);

    while hasdata(mbq)
        iteration = iteration + 1;
        [XTrain] = next(mbq);
        XTrain = dlarray(XTrain, 'TBC');
        % Call the RNN model function
        [datafromRNN, lossR] = RNN_model(XTrain, net);
        % Compute gradients and update parameters
        [gradientsR] = dlfeval(@gradientFunction, mean(lossR), net);
        [net, trailingAvgR, trailingAvgSqR] = adamupdate(net, gradientsR, trailingAvgR, trailingAvgSqR, iteration, learnRate);
        disp(['Iteration ', num2str(iteration), ', Loss: ', num2str(extractdata(lossR))]);
    end
end

function [gradientsR] = gradientFunction(lossR, net) gradientsR = dlgradient(lossR, net.Learnables); end

function [datafromRNN, loss] = RNN_model(data, net) z = data; [coder, last] = forward(net, z, 'Outputs', {'maxpool1d_2', 'fc_3'}); loss = mse(last, z); end

By ensuring that the gradients are correctly computed and the Adam optimizer updates the parameters accordingly, the weights and biases of the network should now change between iterations, leading to effective training progress.

Now let’s answer your questions.

Why are the weights and biases not updating, and why do the biases remain zero? If biases are not updating and remain zero, it could indicate a problem with the initialization or the learning rate being too low. Ensure that biases are initialized correctly, preferably with small random values to break symmetry. Additionally, consider adjusting the learning rate to a more suitable value that allows biases to update effectively.

How can I ensure that the gradients computed are correct and being applied effectively?

To verify the correctness of computed gradients and their effective application, you can employ various techniques:

Gradient Checking: Implement numerical gradient checking to compare computed gradients with numerical approximations. Discrepancies may indicate issues in gradient computation. Visualizing Gradients: Plot and analyze the gradients to ensure they follow expected patterns and magnitudes. Debugging Gradient Functions: Review the gradient computation function (gradientFunction) to ensure it correctly calculates gradients with respect to the loss.

Are there any specific settings or modifications I should consider to resolve this issue?

To enhance the training process and address the issues at hand, consider the following settings and modifications:

Learning Rate Adjustment: Experiment with different learning rates to find an optimal value that facilitates weight and bias updates without causing instability. Regularization Techniques: Introduce regularization methods like L1 or L2 regularization to prevent overfitting and aid in smoother weight updates. Batch Normalization: Verify the implementation of batch normalization layers to stabilize training and improve gradient flow. Network Architecture: Evaluate the complexity and design of your neural network architecture to ensure it is suitable for the task at hand and facilitates effective weight updates.

I hope this will help resolve your issues.


Ruth
Ruth el 9 de Jul. de 2024
Editada: Ruth el 9 de Jul. de 2024
Hi Syed,
The forward call should also be inside a function called by dlfeval to ensure auto diff occurs as expected. I would recommend combining the loss and gradient calculations into one function to do this:
function [loss,gradientsR] = RNN_model(data,net)
z = data;
[coder, last] = forward(net, z, 'Outputs', {'maxpool1d_2', 'fc_3'});
loss = mean(mse(last,z));
gradientsR = dlgradient(loss, net.Learnables);
end
This should be called inside the loop using dlfeval:
[lossR, gradientsR] = dlfeval(@RNN_model,XTrain,net);
You might need to edit this a bit as I'm not completely sure of your code, for example where datafromRNN comes from however, moving the loss and gradient calculation into one function called by dlfeval should resolve the issue.

Productos


Versión

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by