Borrar filtros
Borrar filtros

load trained reinforcement learning multi-Agents to sim

3 visualizaciones (últimos 30 días)
Chao Wang
Chao Wang el 16 de Abr. de 2021
Respondida: Chao Wang el 7 de Dic. de 2021
Hello,
I trained four agents with the Q learning method in reinforcement learning. After the training, the trained agents were loaded into the simulation, but they always chose the same action and remained unchanged, which failed to achieve the expected effect in the previous training.
Here is my code
clc;
clear;
mdl = 'FOUR_DG_0331';
open_system(mdl);
agentBlk = ["FOUR_DG_0331/RL Agent1", "FOUR_DG_0331/RL Agent2", "FOUR_DG_0331/RL Agent3", "FOUR_DG_0331/RL Agent4"];
oInfo = rlFiniteSetSpec([123,456,789]);
aInfo = rlFiniteSetSpec([150,160,170]);
aInfo1 = rlFiniteSetSpec([150,170]);
obsInfos = {oInfo,oInfo,oInfo,oInfo};
actInfos = {aInfo1,aInfo,aInfo,aInfo};
env = rlSimulinkEnv(mdl,agentBlk,obsInfos,actInfos);
Ts = 0.01;
Tf = 4;
rng(0);
qTable1 = rlTable(oInfo,aInfo1);
qTable2 = rlTable(oInfo,aInfo);
qTable3 = rlTable(oInfo,aInfo);
qTable4 = rlTable(oInfo,aInfo);
criticOpts = rlRepresentationOptions('LearnRate',0.1);
Critic1 = rlQValueRepresentation(qTable1,oInfo,aInfo1,criticOpts);
Critic2 = rlQValueRepresentation(qTable2,oInfo,aInfo,criticOpts);
Critic3 = rlQValueRepresentation(qTable3,oInfo,aInfo,criticOpts);
Critic4 = rlQValueRepresentation(qTable4,oInfo,aInfo,criticOpts);
%/*Code here for agent option**/
%... ....
%........
agent1 = rlQAgent(Critic1,QAgent_opt);
agent2 = rlQAgent(Critic2,QAgent_opt);
agent3 = rlQAgent(Critic3,QAgent_opt);
agent4 = rlQAgent(Critic4,QAgent_opt);
trainOpts = rlTrainingOptions;
trainOpts.MaxEpisodes = 1000;
trainOpts.MaxStepsPerEpisode = ceil(Tf/Ts);
trainOpts.StopTrainingCriteria = "EpisodeCount";
trainOpts.StopTrainingValue = 1000;
trainOpts.SaveAgentCriteria = "EpisodeCount";
trainOpts.SaveAgentValue = 15;
trainOpts.SaveAgentDirectory = "savedAgents";
trainOpts.Verbose = false;
trainOpts.Plots = "training-progress";
doTraining = false;
if doTraining
stats = train([agent1, agent2, agent3, agent4],env,trainOpts);
else
load(trainOpts.SaveAgentDirectory +"/Agents16.mat",'agent');
simOpts = rlSimulationOptions('MaxSteps',ceil(Tf/Ts));
experience = sim(env,[agent1 agent2 agent3 agent4 ],simOpts)
end
The result of the sim call is that all four agents choose the action 150.The agent does not choose other actions as it does when it is trained.
I don´t understand why... Can somebody help me out on this?
  1 comentario
FATAO ZHOU
FATAO ZHOU el 28 de Sept. de 2021
Maybe I have the same question with you,the next is my question,
I want to load the same pretrained agent into the different RL Agent blocks, but use the load function, it can just load the frist one(RL Agent1), the second RL Agent2 do not work, maybe we solved it with the same way, but I do not know at know.

Iniciar sesión para comentar.

Respuestas (2)

Ari Biswas
Ari Biswas el 16 de Abr. de 2021
It could mean that the agents have converged to suboptimal policies. You can train the agents for longer to see if there is an improvement. Note that the behavior you see during training has exploration associated with it. If the EpsilonGreedyExploration.Epsilon parameter has not decayed much then the agents are still undergoing exploration. This could be one reason why you see a difference in the sim behavior.
  2 comentarios
Chao Wang
Chao Wang el 17 de Abr. de 2021
After the training, how to see the value in the QTable.Every time I open the QTable, all the values are 0.
Chao Wang
Chao Wang el 19 de Abr. de 2021
Editada: Chao Wang el 19 de Abr. de 2021
I've tried training for longer, but the agents still doesn't work。Is this loading method wrong?

Iniciar sesión para comentar.


Chao Wang
Chao Wang el 7 de Dic. de 2021
Maybe you can try that
agent1 = load("Agent100.mat");
agent2 = load("Agent90mat");

Productos


Versión

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by