Borrar filtros
Borrar filtros

Reinforcement Learning . Sudden very high Rewards during training of RL model.

6 visualizaciones (últimos 30 días)
sir during the training i get sudden very high rewards of order 10e16 (shown in image attached) and i am unable to figure out what is causing this. here is the code i am using and i am also attaching the simulink model.
Tf = 10;
Ts = 0.1;
mdl = 'rl_exam2'
obsInfo = rlNumericSpec([3 1]);
obsInfo.Name = 'observations';
obsInfo.Description = 'integrated error, error, Response';
numObservations = obsInfo.Dimension(1)
actInfo = rlNumericSpec([1 1],'LowerLimit',0,'UpperLimit',1);
actInfo.Name = 'Control Input';
numActions = actInfo.Dimension(1);
%% To Create Environment
env = rlSimulinkEnv(mdl,[mdl '/RL Agent'],obsInfo,actInfo);
%%
rng(0)
%%
%% To Create Critic Network
statePath = [
imageInputLayer([numObservations 1 1],'Normalization','none','Name','State')
fullyConnectedLayer(50,'Name','CriticStateFC1')
reluLayer('Name','CriticRelu1')
fullyConnectedLayer(40,'Name','CriticStateFC2')];
actionPath = [
imageInputLayer([numActions 1 1],'Normalization','none','Name','Action')
fullyConnectedLayer(40,'Name','CriticActionFC1')];
commonPath = [
additionLayer(2,'Name','add')
reluLayer('Name','CriticCommonRelu')
fullyConnectedLayer(1,'Name','CriticOutput')];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');
criticOpts = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation',{'State'},'Action',{'Action'},criticOpts);
actorNetwork = [
imageInputLayer([numObservations 1 1],'Normalization','none','Name','State')
fullyConnectedLayer(40,'Name','actorFC1')
reluLayer('Name','ActorRelu1')
fullyConnectedLayer(numActions,'Name','actorFC2')
tanhLayer('Name','actorTanh')
scalingLayer('Name','Action','Scale',0.5,'Bias',0.5)
];
actorOptions = rlRepresentationOptions('LearnRate',1e-04,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,'Observation',{'State'},'Action',{'Action'},actorOptions);
%% To Create Agent
agentOpts = rlDDPGAgentOptions(...
'SampleTime',0.1,...
'TargetSmoothFactor',1e-3,...
'DiscountFactor',1,...
'ExperienceBufferLength',1e6,...
'MiniBatchSize',64,...
'ExperienceBufferLength',1e6);
agentOpts.NoiseOptions.Variance = 0.08;
agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;
agent = rlDDPGAgent(actor,critic,agentOpts)
%% Training Options
maxepisodes = 3000;
maxsteps = ceil(Tf/Ts);
trainingOpts = rlTrainingOptions(...
'MaxEpisodes',maxepisodes,...
'MaxStepsPerEpisode',maxsteps,...
'ScoreAveragingWindowLength',20, ...
'Verbose',false,...
'Plots','training-progress',...
'StopTrainingCriteria','EpisodeCount',...
'StopTrainingValue',1500);
%% TO TRAIN
doTraining = true;
if doTraining
trainingStats = train(agent,env,trainingOpts);
% save('agent_new.mat','agent_ready') %%% to save agent ###
else
% Load pretrained agent for the example.
load('agent_old.mat','agent')
end

Respuesta aceptada

Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis el 25 de Mayo de 2023
You should first check the 'error' signal that you feed in the reward for those episodes. Could be that the error becomes too big/the system becomes unstable, which leads to those large negative values
  3 comentarios
Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis el 25 de Mayo de 2023
Can you send a screenshot? I do not know the specifics of the problem but the reward I saw did not look unreasonable. You can maybe use a smaller scaling factor instead of 15 which I believe you had. But the spikes are probably still coming from the system going unstable. You may want to consider implementing an IsDone signal to stop the simulation if the system goes unstable
Sourabh
Sourabh el 28 de Mayo de 2023
i have tried few reward functions but most i get is my response settle at 0.3 i dont know why
plz have a look

Iniciar sesión para comentar.

Más respuestas (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by