Confusion in Critic network architecture design in DDPG

10 visualizaciones (últimos 30 días)
Hello all,
I am trying to implement the following architecture for DDPG agent in MATLAB.
"In our design and implementation, we used a 2-layer fullyconnected feedforward neural network to serve as the actor network, which includes 400 and 300 neurons in the first and second layers respectively, and utilized the ReLU function for activation. In the final output layer, we used tanh(·) as the activation function to bound the actions.
Similarly, for the critic network, we also used a 2-layer fully-connected feedforward neural network with 400 and 300 neurons in the first and second layers respectively, and with ReLU for activation. Besides, we utilized the L2 weight decay to prevent overfitting."
This is taken from a paper.
Now I have implemented the actor in the following way--- (don't bother about the hyperparameters)
actorNetwork = [
featureInputLayer(numObservations,'Normalization','none','Name','observation')
fullyConnectedLayer(400,'Name','fc1')
reluLayer('Name','relu1')
fullyConnectedLayer(300,'Name','fc2')
reluLayer('Name','relu2')
fullyConnectedLayer(numActions,'Name','fc3')
tanhLayer('Name','tanh1')
scalingLayer('Name','ActorScaling1','Scale',[2.5;0.2618],'Bias',[-0.5;0])];
actorOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1,'L2RegularizationFactor',1e-4);
actor = rlDeterministicActorRepresentation(actorNetwork,observationInfo,actionInfo,...
'Observation',{'observation'},'Action',{'ActorScaling1'},actorOptions);
However, I am confused on how to write the code for the Critic according to that paper description. I have done the following.
statePath = [
featureInputLayer(numObservations,'Normalization','none','Name','observation')
fullyConnectedLayer(400,'Name','fc1')
reluLayer('Name','relu1')
fullyConnectedLayer(300,'Name','fc2')
reluLayer('Name','relu2')
additionLayer(2,'Name','add')
fullyConnectedLayer(400,'Name','fc3')
reluLayer('Name','relu3')
fullyConnectedLayer(300,'Name','fc4')
reluLayer('Name','relu4')
fullyConnectedLayer(1,'Name','fc5')];
actionPath = [
featureInputLayer(numActions,'Normalization','none','Name','action')];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
%criticNetwork = connectLayers(criticNetwork,'fc5','add/in2');
criticOptions = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
'Observation',{'observation'},'Action',{'action'},criticOptions);
But I am confused in 'additionLayer' and the 'actionPath'. Is my implementation according to that paper description?
Can anyone suggest?
Thanks.

Respuesta aceptada

Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis el 25 de Nov. de 2020
Hello,
Does this paper use DDPG as well? Any images that show the network architecture? If it's another algorithm, the critic may be implemented with a state value network V(s).
DDPG uses Q-network for the critic which needs to take in state and actions (s,a). Reinforcement Learning Toolbox lets you implement this architecture by providing separate input "channels" or paths for the state and the action. That allows you to use different layers in these two paths to extract features more efficiently. See for example image below:
If you want, you can concatenate the observation and action inputs and use a common feature extraction path as follows:
% create a network to be used as underlying critic approximator
statePath = featureInputLayer(numObservations, 'Normalization', 'none', 'Name', 'state');
actionPath = featureInputLayer(numActions, 'Normalization', 'none', 'Name', 'action');
commonPath = [concatenationLayer(1,2,'Name','concat')
fullyConnectedLayer(400, 'Name', 'CriticStateFC1')
reluLayer('Name', 'CriticRelu1')
fullyConnectedLayer(300, 'Name', 'CriticStateFC2')
reluLayer('Name','CriticRelu2')
fullyConnectedLayer(1,'Name','StateValue')];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = addLayers(criticNetwork, commonPath);
criticNetwork = connectLayers(criticNetwork,'state','concat/in1');
criticNetwork = connectLayers(criticNetwork,'action','concat/in2');
plot(criticNetwork)
Hope that helps
  5 comentarios
laha_M
laha_M el 30 de Nov. de 2020
Hello Emmanouil,
I tried training the agent, but it's performing quite poorly. I think it may be a problem with the hyperparameter values since I have not tuned anything. Now I have two questions--
  1. I am trying to understand the effects of hyperparameters by reading some resources. But I want to know if there is anything in MATLAB that may help solve this problem other than trial-and-error?
  2. How do I save the best performing agent given I don't know the critical (reward) value? Basically, I want to save the agent that provides maximum reward or, say, top-5 highest rewarding agents?
Thanks.
Maha Mosalam
Maha Mosalam el 8 de Dic. de 2021
is this network mean we make a critic with input (abs+action) , is concatenationLayer means that?
I want to simply make a critic same as actor layers , but with input (obs+action) and output Q fn? ..the aabove network means that

Iniciar sesión para comentar.

Más respuestas (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by