Reinforcement Leaning DQN Training Convergence Problem

3 visualizaciones (últimos 30 días)

Gülin Sayal el 6 de Jun. de 2021

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/849045-reinforcement-leaning-dqn-training-convergence-problem

training.PNG

Hi everyone,

I am designing an energy management system for a vehicle, and using DQN for optimizing fuel consumption. Here are some related lines from my code.

env = rlSimulinkEnv(mdl,agentblk,obsInfo,actInfo);
nI = obsInfo.Dimension(1);  
nL = 24;   
nO = numel(actInfo.Elements);
dnn = [
    featureInputLayer(nI,'Name','state','Normalization','none')
    fullyConnectedLayer(nL,'Name','fc1')
    reluLayer('Name','relu1')
    fullyConnectedLayer(nL,'Name','fc2')
    reluLayer('Name','relu2')
    fullyConnectedLayer(nO,'Name','output')];
criticOpts = rlRepresentationOptions('LearnRate',0.00025,'GradientThreshold',1);
critic = rlQValueRepresentation(dnn,obsInfo,actInfo,'Observation',{'state'},criticOpts);
agentOpts = rlDQNAgentOptions(...
    'UseDoubleDQN',false, ...    
    'TargetUpdateMethod',"periodic", ...
    'TargetUpdateFrequency',4, ...   
    'ExperienceBufferLength',1000, ...
    'DiscountFactor',0.99, ...
    'MiniBatchSize',32);
agentOptions.EpsilonGreedyExploration.Epsilon=1;
agentOptions.EpsilonGreedyExploration.EpsilonMin=0.2;
agentOptions.EpsilonGreedyExploration.EpsilonDecay=0.0050;
agentObj = rlDQNAgent(critic,agentOpts)
maxepisodes = 10000;
maxsteps = ceil(T/Ts);
trainingOpts = rlTrainingOptions('MaxEpisodes',10000,...
    'MaxStepsPerEpisode',maxsteps,...
    'Verbose',false,...
    'Plots','training-progress',...
    'StopTrainingCriteria','EpisodeReward',...
    'StopTrainingValue', 0);
 trainingStats = train(agentObj,env,trainingOpts)

The problem is that after training, rewards do not converge. Moreover, long-term estimated cumulative reward Q0 diverges. I already read some posts regarding the topic here, then I normalized my action and observation space which did not help. In addition to that, I also tried adding scaling layer right before the last fullyConnectedLayer which also did not help. You can find my training progress curves in attachment.

So, what can I try further so that Q0 does not diverge and episode rewards converge.

Also, I would really like to know how the Q0 is calculated. It is not possible for my model to have such big long-term estimated rewards.

Best Regards,

Gülin

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Respuestas (0)

Iniciar sesión para responder a esta pregunta.

Categorías

AI, Data Science, and Statistics Deep Learning Toolbox Applications Autonomous and Control Systems Reinforcement Learning

Más información sobre Reinforcement Learning en Help Center y File Exchange.

Productos

Versión

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by