Episode Q0 increases exponentially

18 visualizaciones (últimos 30 días)
DAMODARAN B.K
DAMODARAN B.K el 16 de Feb. de 2021
Editada: DAMODARAN B.K el 17 de Feb. de 2021
Can anyone explain why episode Q0 in RL increases exponentially after convergence of reward to a suboptimal policy?

Respuestas (1)

Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis el 16 de Feb. de 2021
Hello,
Please take a look at this answer for some suggestions. Normalizing observations, rewards, and actions can also help avoid situations like these.
Hope this helps
  1 comentario
DAMODARAN B.K
DAMODARAN B.K el 17 de Feb. de 2021
Editada: DAMODARAN B.K el 17 de Feb. de 2021
is episode Q0, criticnetwork output or target value?

Iniciar sesión para comentar.

Categorías

Más información sobre Training and Simulation en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by