Create custom policy function for a RL DQN.

2 visualizaciones (últimos 30 días)
Yiyang Zhou
Yiyang Zhou el 20 de Nov. de 2019
Respondida: Anh Tran el 27 de Mzo. de 2020
Hi Community,
I am working on a project that requires me to have a little bit modification of the DQN policy. The learned function is still Q, but instead of taking the argmax Q(s,a), I have a few more conditions added (most likely some if statement as hard constraints). I am wondering if it is ever possible for me to make this change? If so, where should i work on?
Best regards,
Yiyang

Respuestas (1)

Anh Tran
Anh Tran el 27 de Mzo. de 2020
Currently there I do not see any workaround to modify DQN policy directly with buit-in rlDQNAgent. A possible workaround is to reimplement DQN agent with rlQValueRepresentation, introduced in MATLAB R2020a
You can refer to RL custom train loop example where we implement vanilla policy gradients with RL Toolbox.
For discrete action, I would recommend multi-output Q value representation Q(o) (better performance than Q(o,a)).
% create Q(o) critic, assumed you defined NeuralNet,ObservationInfo,ActionInfo
Critic = rlQValueRepresentation(NeuralNet,ObservationInfo,ActionInfo,'Observation',ObsLayerName);
% get state-action values of an observation RandomObservation
Q = getValue(Critic,RandomObservation)

Productos


Versión

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by