What is the best activation function to get action between 0 and 1 in DDPG network?

16 visualizaciones (últimos 30 días)

Sayak Mukherjee el 13 de Oct. de 2020

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/613031-what-is-the-best-activation-function-to-get-action-between-0-and-1-in-ddpg-network

Comentada: awcii el 28 de Jul. de 2023

Respuesta aceptada: Emmanouil Tzorakoleftherakis

I am using DDPG network to run a control algorithm which has inputs (actions of RL agent, 23 in total) varying between 0 and 1. I an defining this using rlNumericSpec

actInfo = rlNumericSpec([numAct 1],'LowerLimit',0,'UpperLimit', 1);

Then I am using tanhLayer in the actor network (similar to bipedal robot example) and then using

actorOptions = rlRepresentationOptions('Optimizer','adam','LearnRate',1e-4, 'GradientThreshold',1,'L2RegularizationFactor',1e-5);
actor = rlRepresentation(actorNetwork,env.getObservationInfo,env.getActionInfo, 'Observation',{'observation'},  'Action',{'ActorTanh1'},actorOptions);

But i feel that the model is only taking the extreme options ie mostly 0 and 1.

Will it be better to use a sigmoid function to get better action estimates?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Respuesta aceptada

Emmanouil Tzorakoleftherakis el 15 de Oct. de 2020

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/613031-what-is-the-best-activation-function-to-get-action-between-0-and-1-in-ddpg-network#answer_514888

Hello,

With DDPG, a common thing to do in the final 3 layers of the actor is to use a fully connected layer, a tanh layer and a scaling layer. Tanh will get the ouput of that layer between -1 and 1 and then you can use the scaling layer to scale/shift values as needed based on the specifications of the actuator in your problem.

It seems the problem here is due to noise that is being added during training with DDPG to allow sufficient exploration (for example see step 1 here). The default noise options have a pretty high variance, so when this is added to the output of the tanh layer, it ends up outside the [0, 1] range and is being clipped. This is why you are only getting the two extremes.

Try adjusting the DDPG noise options, and particularly the variance (make it smaller, e.g. <=0.1). Also, see here for some best practices when choosing noise parameters.

Hope that helps

12 comentarios
Mostrar 10 comentarios más antiguosOcultar 10 comentarios más antiguos

Sayak Mukherjee el 15 de Oct. de 2020

Editada: Sayak Mukherjee el 15 de Oct. de 2020

Abrir en MATLAB Online

I should have been clearer

actInfo = rlNumericSpec([numAct 1],'LowerLimit',0,'UpperLimit', 1);
actInfo.Name = 'STIM'
env = rlSimulinkEnv(mdl,blk,obsInfo,actInfo);

And then I am defining the actornetwork

actorNetwork = [
    imageInputLayer([numObs 1 1],'Normalization','none','Name','observation')
    fullyConnectedLayer(actorLayerSizes(1), 'Name', 'ActorFC1', ...
            'Weights',2/sqrt(numObs)*(rand(actorLayerSizes(1),numObs)-0.5), ... 
            'Bias',2/sqrt(numObs)*(rand(actorLayerSizes(1),1)-0.5))
    reluLayer('Name', 'ActorRelu1')
    fullyConnectedLayer(actorLayerSizes(2), 'Name', 'ActorFC2', ... 
            'Weights',2/sqrt(actorLayerSizes(1))*(rand(actorLayerSizes(2),actorLayerSizes(1))-0.5), ... 
            'Bias',2/sqrt(actorLayerSizes(1))*(rand(actorLayerSizes(2),1)-0.5))
    reluLayer('Name', 'ActorRelu2')
    fullyConnectedLayer(numAct, 'Name', 'ActorFC3', ... 
            'Weights',2*5e-3*(rand(numAct,actorLayerSizes(2))-0.5), ... 
            'Bias',2*5e-5*(rand(numAct,1)-0.5))                       
    tanhLayer('Name','ActorTanh1')
    ];
% Create actor representation
actorOptions = rlRepresentationOptions('Optimizer','adam','LearnRate',1e-4, ...
                                       'GradientThreshold',1,'L2RegularizationFactor',1e-5);
actor = rlRepresentation(actorNetwork,env.getObservationInfo,env.getActionInfo, ... 
                         'Observation',{'observation'}, ...
                         'Action',{'ActorTanh1'},actorOptions);

So my question is do I need a separate scaling layer after tanh layer even though I have defined lowerlimit as 0 in actInfo. My actions fluctuated between -1 and 1 with this architecture. If I use sigmoid function then I get the action between 0 and 1.

Sayak Mukherjee el 15 de Oct. de 2020

thanks

awcii el 28 de Jul. de 2023

@Sayak Mukherjee What about your problem ? Did you solve it ?

Iniciar sesión para comentar.

Más respuestas (0)

Iniciar sesión para responder a esta pregunta.

Categorías

AI and Statistics Deep Learning Toolbox Applications Autonomous and Control Systems Reinforcement Learning

Más información sobre Reinforcement Learning en Help Center y File Exchange.

Productos

Versión

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

What is the best activation function to get action between 0 and 1 in DDPG network?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

12 comentarios
Mostrar 10 comentarios más antiguosOcultar 10 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

What is the best activation function to get action between 0 and 1 in DDPG network?

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

12 comentarios Mostrar 10 comentarios más antiguosOcultar 10 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

12 comentarios
Mostrar 10 comentarios más antiguosOcultar 10 comentarios más antiguos