Can episode Q0 ( DDPG agent) be referred as an indicator for training quality?

Question

Pradyumna Saripalli el 16 de Abr. de 2022

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/1697825-can-episode-q0-ddpg-agent-be-referred-as-an-indicator-for-training-quality

Respondida: Ayush Modi el 17 de En. de 2024

I am trying to use RL toolbox to obtain engine emission controller, using a DDPG agent to obtain the actions. I am training the agent for 3000 episodes and wanted to understand about the training termination criteria.

In my case, the episode reward varies a lot for almost the complete training process (probably because I set 'ISDONE' signal to False)
The episode Q0, however, is unstable in the beginning and later reaches almost saturation after around 1700 episodes

Hence, I would like to understand whether a stable episode Q0 can be used as an indication for the learning quality of the RL agent?

PS- I am using the DDPG agent for my problem statement.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Ayush Modi el 17 de En. de 2024

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/1697825-can-episode-q0-ddpg-agent-be-referred-as-an-indicator-for-training-quality#answer_1391451

Abrir en MATLAB Online

Hi Pradyumna,

I found following answer in the community regarding Episode Q0. It is not necessary for Episode Q0 to be an indication of the learning quality of the RL agent for actor-critic methods.

https://www.mathworks.com/matlabcentral/answers/854195-what-exactly-is-episode-q0-what-information-is-it-giving

"In general, it is not required for this to happen for actor-critic mathods. The actor may converge first and at that point it would be totally fine to stop training."

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Can episode Q0 ( DDPG agent) be referred as an indicator for training quality?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Can episode Q0 ( DDPG agent) be referred as an indicator for training quality?

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos