How to continue training a DQN agent in the reinforcement learning toolbox?

11 visualizaciones (últimos 30 días)
I have created a neural network and DQN agent using the MATLAB reinforcement learning toolbox, using the following code
createEnvironment
createDQNetwork % Produces critic, criticOptions & GPU
createDQNOptions % Produces agentOptions
createDQNTrainingOptions % Produces trainOptions & parrallel processing
agent = rlDQNAgent(critic,agentOptions); % Create the agent
validateEnvironment(env)
After this, I begin training the agent using the following code.
trainingResults = train(agent,env,trainOptions);
curDir = pwd;
saveDir = 'savedAgents';
cd(saveDir)
save(['trainedAgent' datestr(now,'mm_DD_YYYY_HHMM')],'agent','-v7.3');
% save(['trainedAgent' datestr(now,'mm_DD_YYYY_HHMM')],'agent','trainingResults','-v7.3');
cd(curDir)
The agent begins training succesfully and I can observe it is learning how to control the system. Due to system memory constraints, I need to run the training process multiple times. When the first training process is finished, I simply run the following command again:
trainingResults = train(agent,env,trainOptions);
as I don't need to create a brand new agent, network, environment etc. from scratch. However, the behaviour of the agent when training begins the second time has obviously reverted back to what is was when it was first created. How can I begin retraining the agent, while keeping the progress from the previous training session?
Edit: My system has 64GB of RAM, getting more isn't really an option....

Respuesta aceptada

Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis el 27 de En. de 2020
Hi James,
It looks like the experience buffer is the culprit here. Have a look at this question for a suggestion. Pretty much you need to make sure you also save the experience buffer when you stop training. I would also recommend reducing the size of the experience buffer just enough to reduce memory utilization and make it feasible to train in one go.
  2 comentarios
James Norris
James Norris el 31 de En. de 2020
Hi Emmanouil,
Thanks for the response, this has helped a lot. In addition for future people with the problem, the exploration factor randomises movement during the initial episodes of training, which can cause regression to bad habits. The exploration decay should be adjusted over the whole training set, not reset each time.
Maedeh
Maedeh el 6 de Feb. de 2020
Hi,
I have created a DDPGA agent using the MATLAB reinforcement learning toolbox inverted pendel.
How can I save the experience to analys it'?

Iniciar sesión para comentar.

Más respuestas (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by