Realize MADDPG in Matlab

Question

Huan Yang el 3 de En. de 2021

1
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/707558-realize-maddpg-in-matlab

Comentada: MAZBAHUR KHAN el 11 de Sept. de 2023

I am working on a multi agent DRL issue. My environment is created by the MATLAB environment template. As MADDPG uses a common critic but different actors during the training process, I'm afraid that I still cannot realize this DRL model currently. The experience buffer is also different from the one of DDPG agent.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Emmanouil Tzorakoleftherakis el 4 de En. de 2021

1
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/707558-realize-maddpg-in-matlab#answer_590693

To create agents that share critics I believe you would have to implement that using a custom agent/training loop (see here and here). The built-in algorithms don't allow you to do that.

Also, as of R2020b, you can have (decentralized) multi-agent training in Simulink only (not with a MATLAB environment). If this is of interest to you, you can turn your MATLAB environment into a Simulink one using a MATLAB Function block.

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Emmanouil Tzorakoleftherakis el 11 de Sept. de 2023

Unfortunately, this is not possible as of R2023a unless you write your own custom loop, but this is something we are looking at.

MAZBAHUR KHAN el 11 de Sept. de 2023

@Emmanouil Tzorakoleftherakis thank you very much for replying.

I am going through one more issue I was hoping you could help me with.

For a decentralised multi agent training for multiple mobile robot path planning, I have set the episode termination criteria ( in simulink :isdone) for each robot individually such that when the robot clashes with an obstacle or reaches the goal position, the episode is terminated. But I noticed that when one robot has reached its episode termination criteria but the other robots haven’t, the episode terminates and new episode begins. And in the next episode all the robots are assigned into new initial positions. Hence inefficiency is induced in training of each robot since their reward collection is getting interrupted for the other robots episode termination even though they havent reached their own termination criteria.

I was wondering, is there a way to make a robot wait for other robots to reach their termination criteria after it reaches its own episode termination criteria before a new episode starts for all robots?

Iniciar sesión para comentar.

Realize MADDPG in Matlab

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Realize MADDPG in Matlab

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

3 comentarios Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo