Reinforcement learning DDPG action fluctuations
5 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Tech Logg Ding
el 17 de Nov. de 2020
Comentada: Karim Darwich
el 1 de Jul. de 2024
Upon attempting to train the path following control example in MATLAB, the training process generated the behviour shown in the picture.
- The steering angle is constantly fluctuating.
- The acceleration is also constantly flucutating.
- The reward convergence is very noisy and seems to jump between a high reward and low reward.
What could be causing this issue? This also happened for other projects I used. One method I used was to penalise the fluctuation in the reward function using this term inspired by a paper published by Wang et. al:
10*[ (d/dt(current_action) * d/dt(previous_action) < 0]
Please let me know how to avoid this problem. Thank you very much!
2 comentarios
Emmanouil Tzorakoleftherakis
el 17 de Nov. de 2020
Hello,
One clarification - the scope signals you are showing on the right, are you getting these during training or after training?
Respuesta aceptada
Emmanouil Tzorakoleftherakis
el 22 de Nov. de 2020
Hello,
During training, DDPG explores the action space by adding noise to the output of the actor (see step 1 here). That explains the variance during training.
Even after training you may see small variations in the actor output for observations that are different but close enough. After all you are effectively using a function approximator to approximate a nonlinear relationship between inputs (observations) and outputs (actions). If you want to get the policy to be more accurate near the setpoint, you could consider training further near the values of interest.
Also, the result you get on your machine may differ from the one posted in the documentation. Please see this post for an explanation.
Hope that helps
2 comentarios
sungho park
el 23 de Feb. de 2022
for me after training, the actor output is always constant. can you explain why?
Más respuestas (0)
Ver también
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!