RL designer toolbox | PPO agent | NaN output as policy

3 visualizaciones (últimos 30 días)

Mostrar comentarios más antiguos

Atusa el 16 de Abr. de 2025

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/2176378-rl-designer-toolbox-ppo-agent-nan-output-as-policy

Comentada: praguna manvi el 22 de Abr. de 2025

Abrir en MATLAB Online

Dear all,

I'm using a PPO RL agent in a simulink environment and I'm training the RL agent using the RL Designer toolbox.

I'm getting the following error in the middle of the training at various episodes:

Error:Block '.../agent Obj/Evaluate Policy/Execute Policy/Enabled Policy Evaluator/Policy Evaluator/Policy Process Experience Internal' outputs 'NaN' for element 1 of output port 1 at major time step 0.

My agent's action is one numeric value which should be bounded between [-1, 1] so there shouldn't be any NaN values.

I'm using a rlContinuousGaussianActor that has a softplusLayer in the last layer of the standard deviation output and a tanhLayer followed by a scalingLayer with Scale=1 in the outputlayer of the mean value for the action. I have also used the following command:

actInfo = rlNumericSpec([1 1], 'LowerLimit', -1, 'UpperLimit', 1);

I'm resetting my environment at each episode using the following command:

simEnv.ResetFcn = @(in) setVariable(in,"q",0,"Workspace",mdl);

I'm sure there's no singularity in my model as the smulation runs with no error. The error only happens during the training with the toolbox and it also doesn't always happen.

I'd appreciate it if you could hel me solve this issue.

Thank you very much.