PPO algorithm training problem in Reinforcement Learning Toolbox
Mostrar comentarios más antiguos

In the PPO training algorithm , here mentioned “For each experience sequence that does not contain a terminal state, N is equal to the ExperienceHorizon option value. Otherwise, N is less than ExperienceHorizon and SN is the terminal state.” ,
Here's my question :When N is smaller than ExperienceHorizon and N is also smaller than the size of mini-batch data, and this continues for multiple consecutive episodes, When does the algorithm update the parameters in this case?
AND another one question is :When will the PPO parameter be updated under the following parameter Settings:
agentOpts = rlPPOAgentOptions(...
'ExperienceHorizon',10000,...
'MiniBatchSize',64,...
'NumEpoch',3,...)
trainOpts = rlTrainingOptions(...
'MaxEpisodes',10000,...
'MaxStepsPerEpisode',30,... )
Respuesta aceptada
Más respuestas (0)
Categorías
Más información sobre Reinforcement Learning en Centro de ayuda y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!