Problems with Reinforcement Learning Toolbox Examples

Question

Averill Law el 7 de Mayo de 2020

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/523758-problems-with-reinforcement-learning-toolbox-examples

Respondida: Ryan Comeau el 10 de Mayo de 2020

For the "Stochastic Waterfall Grid World" example, what hyperparameter settings will cause it to converge? The defaults don't seem to work.

I ran the "Rocket Lander" example for the recommended 20,000 episodes and default settings, and it was still continuing to have violent crash landings. Why is this? What settings will work? The documentation says that it will take 2 to 3 hours to execute, yet it literally took 50 hours on my Dell mobile work station (CPU). I bought the computer two years ago and I believe it has the second-fastest processor that was available at the time. Thank you for your assistance.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Ryan Comeau el 10 de Mayo de 2020

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/523758-problems-with-reinforcement-learning-toolbox-examples#answer_431580

Hello,

I cannot answer the first part of your question as I have not tried the waterfall grid world. I can however deal with the second part.

First off, the execution time for any reinforcement learning model is extremly variable. Recall that we are not updating weights based on a ground truth like deep learning. The algorithm is using the differential equations of motion(these specify the way velocity and acceleration change based on the environment and actions) that are provided in the environment and making time steps to propagate this object through the environment. If the agent is using the maxiumum amount of steps that you allow, it needs to calculate these differential equations each time. If the algorithm takes less steps per episode it will be quicker.

In terms of having the best CPU, if you're running on a mobile(laptop) platform odds are your hardware is thermal throttling(reducing computing power because CPU is too hot) after 20-30 minutes. This could slow it down a lot. If possible, switch to a desktop platform and get a good cooling fan.

The rocket lander is a demon I will admit. I've ran it a bunch and made the following conclutions:

lower the learning rate, I've set mine to 1e-4
Lower the clip factor, mine is 0.1
increase mini-batch size, mine is 128.

Also remember that this is near bleeding edge computing science and is subject to just not work sometimes. I highly reconmmend banging your head against the table to learn and get better results, it's very satisfying.

Hope this helps,

RC

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Problems with Reinforcement Learning Toolbox Examples

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Problems with Reinforcement Learning Toolbox Examples

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos