Can automatic differentiation in a custom deep learning layer keep track of the random numbers generated in the forward function of the layer?

Question

Arman Ahmadian el 18 de En. de 2022

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/1630950-can-automatic-differentiation-in-a-custom-deep-learning-layer-keep-track-of-the-random-numbers-gener

Respondida: Katja Mogalle el 21 de En. de 2022

I'm trying to create a gating neural network (NN) to use in a Mixture of Experts (MoE) settings a schematic of similar to what is shown bellow.

The MOE network will output probabilities of selecting each expert and the gate network (that I'm building) will pick one expert based on those probabilities stochastically at training time (only).

Since the behavior of the gate network is stochastic at training time, its forward function will generate a random vector every time it is evoked.

My understanding is that I also have to keep track of this random vector and use it in a backward function since if I leave the job of the backward function to automatic differentiation another random number will be generated at backpropagation and hence will ruin my training. (Right?)

My problem is that I'm not sure how I can keep track of this random vector. There are 3 possibilities in my opinion

Create an ordinary properties for the random number so that, this number can be recalled each time the backward function is called. My attempts to do this have, so far, failed as it seems that custom NN layers strangly do not keep the property as the program runs. (Maybe it is due to the fact that such objects aren't handle objects?)
Use the memory property of the custom layers. This is not allowed by the compiler as it seems using memory in dlnetworks is not permitted for some reason!
Use a state property. In such a case, I will also have to provide derivatives of such a state property. However, I do not want the compiler to make any changes to the state so providing the derivative is meaningless in this case.

How can I solve this problem?

Thanks.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Katja Mogalle el 21 de En. de 2022

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/1630950-can-automatic-differentiation-in-a-custom-deep-learning-layer-keep-track-of-the-random-numbers-gener#answer_879215

Abrir en MATLAB Online

The automatic differentiation framework stores the actual random numbers generated during the forward pass and uses them directly during the backward pass. So you shouldn't have to do anything special and you can make use of automatic differentiation in your custom layer (by not defining your own backward function).

You can also read a bit more about automatic differentiation in MATLAB here: https://www.mathworks.com/help/deeplearning/ug/deep-learning-with-automatic-differentiation-in-matlab.html

There it says: "In other words, automatic differentiation evaluates derivatives at particular numeric values; it does not construct symbolic expressions for derivatives." Maybe this piece of information helps with understanding the behaviour when using random numbers.

I also put together a small example to illustrate what I mean. It is quite simplified from your example but hopefully you can use it to better understand or play around with autodiff framework and can transfer the idea to your implementation:

% Construct a simple network with some learnable layers and a custom layer
% in the middle which sets some channels of the data to zero.
layers = [ featureInputLayer(10)
    fullyConnectedLayer(5,Name="fc1")
    randomChannelDropLayer(5,"channelDrop")
    fullyConnectedLayer(1,Name="fc2")];
net = dlnetwork(layers);
in = dlarray(rand(10,3),'CB');
% Now let's compute gradients. Note that the custom layer does not specify
% a backward function and hence automatic differentiation is used.
for i=1:5
    disp("Execution #"+i)
    % Every time we do a forward pass a different channel is dropped. The
    % gradient of the custom layer's output with respect to its input
    % contains zeros in the same channel that was randomly dropped during
    % forward.
    [layerOutput,layerGrad] = dlfeval(@customLayerGradients,net,in)
end
function [layerOutput,grad] = customLayerGradients(net,in)
% Compute gradients of the custom layer output with respect to its input.
% This gradient is used for backpropagation through the whole network.
[layerInput,layerOutput] = net.forward(in,Outputs=["fc1","channelDrop"]);
combinedOutput = sum(layerOutput,'all');
grad = dlgradient(combinedOutput,layerInput);
end

And here is the definition of the custom layer:

classdef randomChannelDropLayer < nnet.layer.Layer
    % randomChannelDropLayer sets one randomly selected input channel to
    % all zeros during training. The data passes through the layer
    % unchanged during prediction.
    properties
        NumChannels
    end
    methods
        function layer = randomChannelDropLayer(numChannels,name)
            layer.NumChannels = numChannels;
            layer.Name = name;
        end
        function Y = forward(layer,X)
            channelToDrop = randi(layer.NumChannels,1);
            Y = X;
            Y(channelToDrop,:,:) = 0;
        end
        function Y = predict(~,X)
            Y = X;
        end
    end
end

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Can automatic differentiation in a custom deep learning layer keep track of the random numbers generated in the forward function of the layer?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Can automatic differentiation in a custom deep learning layer keep track of the random numbers generated in the forward function of the layer?

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos