Episode Q0 increases exponentially
Mostrar comentarios más antiguos
Can anyone explain why episode Q0 in RL increases exponentially after convergence of reward to a suboptimal policy?

Respuestas (1)
Emmanouil Tzorakoleftherakis
el 16 de Feb. de 2021
0 votos
Hello,
Please take a look at this answer for some suggestions. Normalizing observations, rewards, and actions can also help avoid situations like these.
Hope this helps
1 comentario
DAMODARAN B.K
el 17 de Feb. de 2021
Editada: DAMODARAN B.K
el 17 de Feb. de 2021
Categorías
Más información sobre Reinforcement Learning en Centro de ayuda y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!