Determine the reward value to stop training in RL agent

25 views (last 30 days)
I saw in example of using RL agent, this sentence:
  • Stop training when the agent receives an average cumulative reward greater than -355 over 100 consecutive episodes. At this point, the agent can control the level of water in the tank.
how did he calculate the exact reward -355 over 100 episodes? Is there any tips could help know when to stop the training at specific point before get worst.
thank you advance

Accepted Answer

Emmanouil Tzorakoleftherakis
Edited: Emmanouil Tzorakoleftherakis on 25 Jan 2023
For some problems you may be able to calculate what the maximum reward that can be collected in an episode is, so you can use this knowledge accordingly in the training settings. In general, there is no recipe that will tell you when it would be good to stop training. You would typically need to train for a large number of episodes to see how the training goes and that could help you identify what a good average reward is. You could also just train for a set number of episodes instead (similar to how you would train for a certain number of epochs in supervised learning).
Hope that helps

More Answers (1)

Sam Chak
Sam Chak on 17 Oct 2022
There is an option to set the StopTrainingValue.
  1 Comment
H. M.
H. M. on 17 Oct 2022
Thank you @Sam Chak for answering.
what I mean is how did he know that if the average cumulative reward reach -355, then the agent can control the level. why -355 exactly?

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by