Episode Q0 increases exponentially

DAMODARAN B.K

16 Feb. 2021

1 Respuesta

Actualizado a las 17 Feb. 2021

4 Visualizaciones (30 días)

Iniciar sesión para responder a esta pregunta.

Follow Question

Iniciar sesión para responder a esta pregunta.

Follow Question

Mostrar comentarios más antiguos

0 votos

Can anyone explain why episode Q0 in RL increases exponentially after convergence of reward to a suboptimal policy?

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Respuestas (1)

Emmanouil Tzorakoleftherakis el 16 de Feb. de 2021

0 votos

Hello,

Please take a look at this answer for some suggestions. Normalizing observations, rewards, and actions can also help avoid situations like these.

Hope this helps

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

DAMODARAN B.K el 17 de Feb. de 2021

Editada: DAMODARAN B.K el 17 de Feb. de 2021

is episode Q0, criticnetwork output or target value?

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Categorías

Más información sobre Reinforcement Learning en Centro de ayuda y File Exchange.

Productos

Reinforcement Learning Toolbox

Etiquetas

Preguntada:

DAMODARAN B.K

el 16 de Feb. de 2021

Editada:

DAMODARAN B.K

el 17 de Feb. de 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

Episode Q0 increases exponentially

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuestas (1)

1 comentario Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Categorías

Productos

Etiquetas

Ver también

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos