Reinforcement LEarning Agents PG & AC only NAN Action, DDPG and TD3 work with same environment

Question

Felix Windels el 15 de Jun. de 2022

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/1740775-reinforcement-learning-agents-pg-ac-only-nan-action-ddpg-and-td3-work-with-same-environment

Respondida: Aiswarya el 26 de Oct. de 2023

For classification:

I have implemented an environment that depicts the energy supply of a single-family house. The system consists of PV modules, a battery, an electric heater, a hot water tank and a gas heater. A market situation is assumed in which there are variable electricity prices (both for the purchase and for the feed-in). The idea is to use RL to regulate the system. Observations are the price forecasts, feed-in forecasts (PV) and load forecasts (electricity and heat). They are "all" energy flows, possible. E.g. PV direct supply (electricity), PV direct supply (heat, electric heater), charging battery (PV/grid), charging heat storage (PV/grid, electric heater), electricity feed-in to grid (PV direct and from battery)....

The actions are numeric, between 0 and 1. The idea is that in a sequence, it is always possible to choose between 0 and 100% of an available power for a "purpose".

Ex:

,

%% Stepfunction:
% First line: PV power for load coverage, minimum from the product: Action(1) (between 1 and 0) and the load demand.
PV_Load                 =   min([(Action(1)*PV_gen) Load_sys]);
% Second line: Coverage of remaining load by available battery power: Minimum of product Action(2) and available battery charge, remaining load and maximum battery power.
Batt_Load               = min([Action(2)*Batt_stor (Load_sys-PV_Load) (this.Batt_P)]);

I hope the description is sufficient for a first impression.

Now I've tried to train diffrent agents.

With DDPG and TD3, the first results are basically plausible. For PG and AC, all actions are output with NAN.

Can anyone give me a clue on this basis?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Aiswarya el 26 de Oct. de 2023

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/1740775-reinforcement-learning-agents-pg-ac-only-nan-action-ddpg-and-td3-work-with-same-environment#answer_1340861

Hi,

I understand that you are trying to set the action output for your model using different RL agents. You observe that DDPG and TD3 set the action output correctly whereas PG and AC does not. The cause for the output of the actions being NaN is code specific and can't be concluded without data. But the different results shown by the agents can be explained as follows:

Different agents show different behavior while setting action output bounds. DDPG and TD3 are off-policy agents and they clip all actions. They can be simply set using "rlNumericSpec". AC and PG, on the other hand, are on-policy agents. These agents don’t enforce constraints set in the action specification. If you want to enforce these limits you have to do it explicitly on the environment side.

One alternative is to set the agent.UseExplorationPolicy = false after training, so the agents can use only mean, and the actions are always within limits. You may refer to the documentation of PG agent for more information: https://www.mathworks.com/help/reinforcement-learning/ug/pg-agents.html

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Reinforcement LEarning Agents PG & AC only NAN Action, DDPG and TD3 work with same environment

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Reinforcement LEarning Agents PG & AC only NAN Action, DDPG and TD3 work with same environment

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos