Agent repeats same sequence of actions each episode

Question

Braydon Westmoreland el 1 de Jul. de 2020

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/557872-agent-repeats-same-sequence-of-actions-each-episode

Editada: Emmanouil Tzorakoleftherakis el 2 de Jul. de 2020

Respuesta aceptada: Emmanouil Tzorakoleftherakis

Can someone please help me understand why my RL Agent is outputting the same sequence of actions each episode, regardless of the observations made from the environment. Here is an example of what I mean:

prev_state = 11.20 11.90 11.30 11.50

action = 0.00 0.00 0.00 0.00

new_state = 11.20 11.90 11.30 11.50

prev_state = 11.20 11.90 11.30 11.50

action = 0.10 0.10 -0.10 0.00

new_state = 11.30 12.00 11.20 11.50

prev_state = 11.30 12.00 11.20 11.50

action = 0.10 0.10 -0.10 0.00

new_state = 11.40 12.00 11.10 11.50

prev_state = 11.40 12.00 11.10 11.50

action = -0.10 -0.10 0.10 0.00

new_state = 11.30 11.90 11.20 11.50

prev_state = 11.30 11.90 11.20 11.50

action = 0.00 0.00 0.10 0.10

new_state = 11.30 11.90 11.30 11.60

prev_state = 12.00 11.20 11.70 11.50

action = 0.00 0.00 0.00 0.00

new_state = 12.00 11.20 11.70 11.50

prev_state = 12.00 11.20 11.70 11.50

action = 0.10 0.10 -0.10 0.00

new_state = 12.00 11.30 11.60 11.50

prev_state = 12.00 11.30 11.60 11.50

action = 0.10 0.10 -0.10 0.00

new_state = 12.00 11.40 11.50 11.50

prev_state = 12.00 11.40 11.50 11.50

action = -0.10 -0.10 0.10 0.00

new_state = 11.90 11.30 11.60 11.50

prev_state = 11.90 11.30 11.60 11.50

action = 0.00 0.00 0.10 0.10

new_state = 11.90 11.30 11.70 11.60

Let me know if you have any questions about the simulation.

More info on the simulation & my other issues: https://www.mathworks.com/matlabcentral/answers/555799-reinforcement-learning-sample-time

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Emmanouil Tzorakoleftherakis el 2 de Jul. de 2020

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/557872-agent-repeats-same-sequence-of-actions-each-episode#answer_460096

Editada: Emmanouil Tzorakoleftherakis el 2 de Jul. de 2020

Hi Braydon,

I am not really sure why you are only looking at the first two episodes. RL can take thousands of episodes to converge so the first few really don't give you enough information. As a matter of fact, I ran your models for 20 episodes and the action sequence was different after a few episodes or so. If nothing else, I would check the reward formulation since this would drive how the neural networks weights change and thus how actions are selected (in addition to exploration).

1.0000e-04

prev_state = 11.90 11.90 12.00 11.20

action = 0.00 0.00 0.00 0.00

new_state = 11.90 11.90 12.00 11.20

prev_state = 11.90 11.90 12.00 11.20

action = 0.10 0.10 -0.10 0.00

new_state = 12.00 12.00 11.90 11.20

prev_state = 12.00 12.00 11.90 11.20

action = -0.10 0.00 -0.10 0.10

new_state = 11.90 12.00 11.80 11.30

prev_state = 11.90 12.00 11.80 11.30

action = -0.10 0.10 0.00 -0.10

new_state = 11.80 12.00 11.80 11.20

prev_state = 11.80 12.00 11.80 11.20

action = 0.10 0.00 -0.10 0.00

new_state = 11.90 12.00 11.70 11.20

1.0000e-04

prev_state = 11.70 11.90 11.50 11.60

action = 0.00 0.00 0.00 0.00

new_state = 11.70 11.90 11.50 11.60

prev_state = 11.70 11.90 11.50 11.60

action = 0.10 0.10 -0.10 0.00

new_state = 11.80 12.00 11.40 11.60

prev_state = 11.80 12.00 11.40 11.60

action = -0.10 0.00 -0.10 0.10

new_state = 11.70 12.00 11.30 11.70

prev_state = 11.70 12.00 11.30 11.70

action = -0.10 0.10 0.00 -0.10

new_state = 11.60 12.00 11.30 11.60

prev_state = 11.60 12.00 11.30 11.60

action = 0.10 0.00 -0.10 0.00

new_state = 11.70 12.00 11.20 11.60

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Agent repeats same sequence of actions each episode

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Agent repeats same sequence of actions each episode

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos