RL agent does not learn properly

Question

Franz Schnyder el 20 de Mzo. de 2023

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/1931855-rl-agent-does-not-learn-properly

Comentada: Franz Schnyder el 17 de Ag. de 2023

Respuesta aceptada: Emmanouil Tzorakoleftherakis

Hello together

I am trying to learn about the Reinforcement Learning Toolbox and want to control the speed of a DC motor using an RL agent to replace a PI controller. I have oriented myself to the example of the water tank. However, I am having some problems learning.

First, I have problems with the agent adjusting itself to arrive at either minimum (0rpm) or maximum(6000rpm) and then not changing its state, even though it was able to achieve a good reward in its own episodes before.

In my reward function, I have specified the error of the target and actual speed as a percentage. When I try to punish him in the reward function so that he doesn't stay at 0rpm anymore, he stays at 0rpm and doesn't try to explore the area. I also have trouble correcting the remaining control error.

In the following the code and some pictures

close all
%param
R = 7.03;
L = 1.04*10^-3;
J = 44.2*10^-7;
a = 2.45*10^-6;
Kn = 250*2*pi/60;
Km = 38.2*10^-3;
actInfo = rlNumericSpec([1 1],'LowerLimit', 0, 'UpperLimit', 24);
actInfo.Name = 'spannung';
obsInfo = rlNumericSpec([3 1],...
    'LowerLimit',[-inf -inf -inf  ]',...
    'UpperLimit',[ inf  inf inf]');
obsInfo.Name = 'observations';
obsInfo.Description = 'integrated error, error, and measured rpm';
env=rlSimulinkEnv("DCMotorRL2", 'DCMotorRL2/RL Agent',...
        obsInfo,actInfo);
env.ResetFcn = @(in)localResetFcn(in);
Ts = 0.1;   %agent sample time
Tf = 20;   %simulation time
rng(0)
statePath = [
    featureInputLayer(obsInfo.Dimension(1),Name="netObsIn")
    fullyConnectedLayer(50)
    reluLayer
    fullyConnectedLayer(25,Name="CriticStateFC2")];
actionPath = [
    featureInputLayer(actInfo.Dimension(1),Name="netActIn")
    fullyConnectedLayer(25,Name="CriticActionFC1")];
commonPath = [
    additionLayer(2,Name="add")
    reluLayer
    fullyConnectedLayer(1,Name="CriticOutput")];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork, ...
    "CriticStateFC2", ...
    "add/in1");
criticNetwork = connectLayers(criticNetwork, ...
    "CriticActionFC1", ...
    "add/in2");
criticNetwork = dlnetwork(criticNetwork);
figure
plot(criticNetwork)
critic = rlQValueFunction(criticNetwork,obsInfo,actInfo, ...
    ObservationInputNames="netObsIn", ...
    ActionInputNames="netActIn");
actorNetwork = [
    featureInputLayer(obsInfo.Dimension(1))
    fullyConnectedLayer(9)      %3
    tanhLayer
    fullyConnectedLayer(actInfo.Dimension(1))
    ];
actorNetwork = dlnetwork(actorNetwork);
actor = rlContinuousDeterministicActor(actorNetwork,obsInfo,actInfo);
agent = rlDDPGAgent(actor,critic);
agent.SampleTime = Ts;
agent.AgentOptions.TargetSmoothFactor = 1e-3;
agent.AgentOptions.DiscountFactor = 1.0;
agent.AgentOptions.MiniBatchSize = 64;
agent.AgentOptions.ExperienceBufferLength = 1e6; 
agent.AgentOptions.NoiseOptions.Variance = 0.8;             %0.3
agent.AgentOptions.NoiseOptions.VarianceDecayRate = 1e-5;   %-5
agent.AgentOptions.CriticOptimizerOptions.LearnRate = 1e-03;
agent.AgentOptions.CriticOptimizerOptions.GradientThreshold = 1;
agent.AgentOptions.ActorOptimizerOptions.LearnRate = 1e-04;
agent.AgentOptions.ActorOptimizerOptions.GradientThreshold = 1;
trainOpts = rlTrainingOptions(...
    MaxEpisodes=4000, ...
    MaxStepsPerEpisode=ceil(Tf/Ts), ...
    ScoreAveragingWindowLength=20, ...
    Verbose=false, ...
    Plots="training-progress",...
    StopTrainingCriteria="AverageReward",...
    StopTrainingValue=800,...
    SaveAgentCriteria="EpisodeCount", ...
    SaveAgentValue=600);
doTraining = true;
if doTraining
    % Train the agent.
    trainingStats = train(agent,env,trainOpts);
end
function in = localResetFcn(in)
% randomize reference signal
blk = sprintf('DCMotorRL2/omega_ref');
h=randi([2000,4000]);
in = setBlockParameter(in,blk,'Value',num2str(h));
%initial 1/min
%  h=randi([2000,4000])*(2*pi)/60;
%  blk = 'DCMotorRL2/DCMotor/Integrator1';
%  in = setBlockParameter(in,blk,'InitialCondition',num2str(h));
end

2 comentarios
Mostrar NingunoOcultar Ninguno

awcii el 17 de Ag. de 2023

Did you solve your problme ?

Franz Schnyder el 17 de Ag. de 2023

Yes, with the increase of the variance of the noise it got better. But in general I had to change many other settings like the observations and the neural networks for a more or less satisfying result. Finally I had the problem that the agent in the simulation oscillated slightly around the setpoint and this had too strong an influence on the real setup.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Emmanouil Tzorakoleftherakis el 20 de Mzo. de 2023

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/1931855-rl-agent-does-not-learn-properly#answer_1197065

Some comments:

1) 150 episodes is really not much, you need to let the training continue for a bit longer

2) There is no guarantee that the reward will always go up. It may go down as the agent explores and then it may be able to find a better policy along the way

3) Noise variance is critical with DDPG agent. Make sure this value is between 1-10% of your action range

4) Sample time of 0.1 seconds seems a bit too large for a motor control application

5) This example is doing FOC with RL, but you may be able to use it for general information:

https://www.mathworks.com/videos/reinforcement-learning-for-field-oriented-control-of-a-permanent-magnet-synchronous-motor-1587727861081.html

https://www.mathworks.com/help/mcb/gs/foc-of-pmsm-using-reinforcement-learning.html

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

RL agent does not learn properly

2 comentarios
Mostrar NingunoOcultar Ninguno

Respuesta aceptada

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

RL agent does not learn properly

2 comentarios Mostrar NingunoOcultar Ninguno

Respuesta aceptada

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

2 comentarios
Mostrar NingunoOcultar Ninguno

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos