Modifying the control actions to safe ones before storing in the experience buffer during SAC agent training.

Question

Ahmed R. Sayed el 18 de En. de 2022

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/1631210-modifying-the-control-actions-to-safe-ones-before-storing-in-the-experience-buffer-during-sac-agent

Respondida: Ahmed R. Sayed el 21 de Sept. de 2022

Hello everyone,

I am implementing a safe off-policy DRL SAC algorithm. Using an iterative convex optimization algorithm moves actions into a safe region. However, this algorithm is applied in the environment. Therefore, the existing rlSACAgent still store unsafe actions in the buffer, and the agent cannot learn the modified actions. Therefore, the iterative algorithm will be supplied with unlearned actions and takes more time to converge. My question is:

How can I store the modified actions in the experience buffer instead of the unsafe ones?

Illustrative figure:

Many thanks for your help.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Ahmed R. Sayed el 21 de Sept. de 2022

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/1631210-modifying-the-control-actions-to-safe-ones-before-storing-in-the-experience-buffer-during-sac-agent#answer_1057795

I found the solution: You need to use the Simulink environment and the RL Agent block with the last action port.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Modifying the control actions to safe ones before storing in the experience buffer during SAC agent training.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Modifying the control actions to safe ones before storing in the experience buffer during SAC agent training.

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos