Resume training for PPO agent
    3 visualizaciones (últimos 30 días)
  
       Mostrar comentarios más antiguos
    
    Harry Dunn
 el 8 de Abr. de 2023
  
    
    
    
    
    Respondida: Emmanouil Tzorakoleftherakis
    
 el 10 de Abr. de 2023
            I am trying to run a PPO agent where the environment is essentially a text file read-in containing data obtained from a robotics dynamics simulator (Webots). This works but there are random spikes in CPU which causes it to crash becuase both the robotics simulator and MATLAB have to be running simulatneously (although it will typicallly do a few thousand episodes at least before it crashes).
I have used the following link to save the agent after every episode and then I reload the agent and re-run: https://uk.mathworks.com/matlabcentral/answers/495436-how-to-train-further-a-previously-trained-agent 
use_previous_agent=true;
if use_previous_agent
    % Load experiences from pre-trained agent       
    load("Filepath...",'saved_agent');
    agent = saved_agent;
else
    % Create a new agent
    agent = rlPPOAgent(actor,critic,agentOpts);
    agent.AgentOptions.CriticOptimizerOptions.LearnRate = 3e-3;
    agent.AgentOptions.ActorOptimizerOptions.LearnRate = 3e-3;
end
trainOpts = rlTrainingOptions(...
    MaxEpisodes=100000,...
    MaxStepsPerEpisode=600000,...
    Plots="training-progress",...
    StopTrainingCriteria="AverageReward",...
    StopTrainingValue=4300,...
    ScoreAveragingWindowLength=100, ...
    SaveAgentCriteria="EpisodeCount", ...
    SaveAgentValue=10, ...
    SaveAgentDirectory = pwd + "\run1\Agents");
trainingStats = train(agent, env, trainOpts);
I'm not sure if this is correct because the above link talks about specifically for DDPG where you have to reset the experience buffer etc. I was wondering if anyone with experience with PPO agents would know if this is a viable process?
Thanks in advance
0 comentarios
Respuesta aceptada
  Emmanouil Tzorakoleftherakis
    
 el 10 de Abr. de 2023
        PPO does not use an experience buffer so you should be fine loading the saved agent to resume training. If you are using advantage normalization though, previous information won't transfer over to the new training session.
0 comentarios
Más respuestas (0)
Ver también
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

