RL agent does not learn properly

28 visualizaciones (últimos 30 días)
Franz Schnyder
Franz Schnyder el 20 de Mzo. de 2023
Comentada: Franz Schnyder el 17 de Ag. de 2023
Hello together
I am trying to learn about the Reinforcement Learning Toolbox and want to control the speed of a DC motor using an RL agent to replace a PI controller. I have oriented myself to the example of the water tank. However, I am having some problems learning.
First, I have problems with the agent adjusting itself to arrive at either minimum (0rpm) or maximum(6000rpm) and then not changing its state, even though it was able to achieve a good reward in its own episodes before.
In my reward function, I have specified the error of the target and actual speed as a percentage. When I try to punish him in the reward function so that he doesn't stay at 0rpm anymore, he stays at 0rpm and doesn't try to explore the area. I also have trouble correcting the remaining control error.
In the following the code and some pictures
close all
%param
R = 7.03;
L = 1.04*10^-3;
J = 44.2*10^-7;
a = 2.45*10^-6;
Kn = 250*2*pi/60;
Km = 38.2*10^-3;
actInfo = rlNumericSpec([1 1],'LowerLimit', 0, 'UpperLimit', 24);
actInfo.Name = 'spannung';
obsInfo = rlNumericSpec([3 1],...
'LowerLimit',[-inf -inf -inf ]',...
'UpperLimit',[ inf inf inf]');
obsInfo.Name = 'observations';
obsInfo.Description = 'integrated error, error, and measured rpm';
env=rlSimulinkEnv("DCMotorRL2", 'DCMotorRL2/RL Agent',...
obsInfo,actInfo);
env.ResetFcn = @(in)localResetFcn(in);
Ts = 0.1; %agent sample time
Tf = 20; %simulation time
rng(0)
statePath = [
featureInputLayer(obsInfo.Dimension(1),Name="netObsIn")
fullyConnectedLayer(50)
reluLayer
fullyConnectedLayer(25,Name="CriticStateFC2")];
actionPath = [
featureInputLayer(actInfo.Dimension(1),Name="netActIn")
fullyConnectedLayer(25,Name="CriticActionFC1")];
commonPath = [
additionLayer(2,Name="add")
reluLayer
fullyConnectedLayer(1,Name="CriticOutput")];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork, ...
"CriticStateFC2", ...
"add/in1");
criticNetwork = connectLayers(criticNetwork, ...
"CriticActionFC1", ...
"add/in2");
criticNetwork = dlnetwork(criticNetwork);
figure
plot(criticNetwork)
critic = rlQValueFunction(criticNetwork,obsInfo,actInfo, ...
ObservationInputNames="netObsIn", ...
ActionInputNames="netActIn");
actorNetwork = [
featureInputLayer(obsInfo.Dimension(1))
fullyConnectedLayer(9) %3
tanhLayer
fullyConnectedLayer(actInfo.Dimension(1))
];
actorNetwork = dlnetwork(actorNetwork);
actor = rlContinuousDeterministicActor(actorNetwork,obsInfo,actInfo);
agent = rlDDPGAgent(actor,critic);
agent.SampleTime = Ts;
agent.AgentOptions.TargetSmoothFactor = 1e-3;
agent.AgentOptions.DiscountFactor = 1.0;
agent.AgentOptions.MiniBatchSize = 64;
agent.AgentOptions.ExperienceBufferLength = 1e6;
agent.AgentOptions.NoiseOptions.Variance = 0.8; %0.3
agent.AgentOptions.NoiseOptions.VarianceDecayRate = 1e-5; %-5
agent.AgentOptions.CriticOptimizerOptions.LearnRate = 1e-03;
agent.AgentOptions.CriticOptimizerOptions.GradientThreshold = 1;
agent.AgentOptions.ActorOptimizerOptions.LearnRate = 1e-04;
agent.AgentOptions.ActorOptimizerOptions.GradientThreshold = 1;
trainOpts = rlTrainingOptions(...
MaxEpisodes=4000, ...
MaxStepsPerEpisode=ceil(Tf/Ts), ...
ScoreAveragingWindowLength=20, ...
Verbose=false, ...
Plots="training-progress",...
StopTrainingCriteria="AverageReward",...
StopTrainingValue=800,...
SaveAgentCriteria="EpisodeCount", ...
SaveAgentValue=600);
doTraining = true;
if doTraining
% Train the agent.
trainingStats = train(agent,env,trainOpts);
end
function in = localResetFcn(in)
% randomize reference signal
blk = sprintf('DCMotorRL2/omega_ref');
h=randi([2000,4000]);
in = setBlockParameter(in,blk,'Value',num2str(h));
%initial 1/min
% h=randi([2000,4000])*(2*pi)/60;
% blk = 'DCMotorRL2/DCMotor/Integrator1';
% in = setBlockParameter(in,blk,'InitialCondition',num2str(h));
end
  2 comentarios
awcii
awcii el 17 de Ag. de 2023
Did you solve your problme ?
Franz Schnyder
Franz Schnyder el 17 de Ag. de 2023
Yes, with the increase of the variance of the noise it got better. But in general I had to change many other settings like the observations and the neural networks for a more or less satisfying result. Finally I had the problem that the agent in the simulation oscillated slightly around the setpoint and this had too strong an influence on the real setup.

Iniciar sesión para comentar.

Respuesta aceptada

Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis el 20 de Mzo. de 2023
Some comments:
1) 150 episodes is really not much, you need to let the training continue for a bit longer
2) There is no guarantee that the reward will always go up. It may go down as the agent explores and then it may be able to find a better policy along the way
3) Noise variance is critical with DDPG agent. Make sure this value is between 1-10% of your action range
4) Sample time of 0.1 seconds seems a bit too large for a motor control application
5) This example is doing FOC with RL, but you may be able to use it for general information:

Más respuestas (0)

Productos


Versión

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by