Can episode Q0 ( DDPG agent) be referred as an indicator for training quality?

5 visualizaciones (últimos 30 días)
I am trying to use RL toolbox to obtain engine emission controller, using a DDPG agent to obtain the actions. I am training the agent for 3000 episodes and wanted to understand about the training termination criteria.
  • In my case, the episode reward varies a lot for almost the complete training process (probably because I set 'ISDONE' signal to False)
  • The episode Q0, however, is unstable in the beginning and later reaches almost saturation after around 1700 episodes
Hence, I would like to understand whether a stable episode Q0 can be used as an indication for the learning quality of the RL agent?
PS- I am using the DDPG agent for my problem statement.

Respuestas (1)

Ayush Modi
Ayush Modi el 17 de En. de 2024
Hi Pradyumna,
I found following answer in the community regarding Episode Q0. It is not necessary for Episode Q0 to be an indication of the learning quality of the RL agent for actor-critic methods.
"In general, it is not required for this to happen for actor-critic mathods. The actor may converge first and at that point it would be totally fine to stop training."

Productos


Versión

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by