How to pass agent-irrelevant non-constant state variables to custom MATlab step functions (Reinforcement Learning Toolbox)

Question

Marc David Rabe el 18 de Feb. de 2022

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/1652960-how-to-pass-agent-irrelevant-non-constant-state-variables-to-custom-matlab-step-functions-reinforce

Comentada: Marc el 24 de En. de 2024

Hi! I'm looking into passing non-constant variables to my step-function when creating an environment using function names (similiar to the cart pole example: https://de.mathworks.com/help/reinforcement-learning/ug/create-custom-reinforcement-learning-environment-in-matlab.html#CreateMATLABEnvironmentUsingCustomFunctionsExample-3).

I can't use the "function handles approach" as the variable values change from step to step.
My variables are irrelevant to the agent (the agent shall not consider them when approximating the value function).

Therefore, I have following questions:

As the step-function returns [NextObs, Reward, IsDone, LoggedSignals] and NextObs & LoggedSignals seem to contain the same information in the cart pole example, how do they differ? The question has been asked before (https://de.mathworks.com/matlabcentral/answers/570985-reinforcement-learning-nextobs-vs-loggedstate-in-step-function), but wasn't answered explicitly.
Does the agent only consider the nextObs variable in training, so that it would be possible to pass variables from step to step using the LoggedSignals variable?

Thanks in advance!

Marc

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Aditya el 24 de En. de 2024

1
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/1652960-how-to-pass-agent-irrelevant-non-constant-state-variables-to-custom-matlab-step-functions-reinforce#answer_1396466

Hi Marc,

I unserstand that you are facing a problem related to reinforcement learning toolbox. The following point might help you out.

In reinforcement learning environments within MATLAB, the step function typically returns four outputs: `NextObs`, `Reward`, `IsDone`, and `LoggedSignals`. Here's a brief explanation of each:

1. NextObs: This represents the next observation that the agent will receive after taking an action. This is the state representation that the agent uses to make decisions and learn from. It's crucial for the agent's learning process as it directly affects the agent's understanding of the environment's dynamics.

2. Reward: This is the immediate reward received after taking an action. The agent uses this to evaluate how good the action was with respect to achieving the goal.

3. IsDone: This is a boolean flag indicating whether the episode has ended. This could be due to the task being completed, a failure state being reached, or the maximum number of steps being exceeded.

4. LoggedSignals: This is additional information that you might want to keep track of for debugging or analysis but is not used by the agent for learning. It's essentially a way to log any extra data that you want to monitor or pass through the simulation without affecting the agent's decision-making process.

Difference between `NextObs` and `LoggedSignals: `NextObs` is the actual state information that the agent uses to learn and make future decisions. `LoggedSignals` is additional information that you want to log or pass along but is not used by the agent for learning. In the cart pole example, they might look similar because the example is simple and doesn't require additional logging, but in more complex environments, `LoggedSignals` could include a variety of other data points.

Does the agent only consider the `NextObs` variable in training: Yes, the agent only considers the `NextObs` variable for training. The `LoggedSignals` are not used in the learning process and can be used to pass additional information through the simulation.

So, if you have variables that change from step to step and are not relevant for the agent's decision-making process, you can indeed pass them using the `LoggedSignals` output. This allows you to keep track of these variables without influencing the agent's learning algorithm. The agent's value function approximation will only be based on the observations (`NextObs`) and rewards (`Reward`) it receives.

Keep in mind that while `LoggedSignals` won't affect the agent's learning directly, you should ensure that the variables you pass through `LoggedSignals` do not inadvertently leak information about the environment that the agent should not have access to, as this could bias the agent's learning process in ways that might not generalize well to other situations or real-world applications.

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Marc el 24 de En. de 2024

Thank you Aditya for your answer!

This was what I expected and the code did behave the correct way. Great to get a confirmation from the community! I hope this clarification will help others facing similiar questions!

Kind regards.

Marc

Iniciar sesión para comentar.

How to pass agent-irrelevant non-constant state variables to custom MATlab step functions (Reinforcement Learning Toolbox)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

How to pass agent-irrelevant non-constant state variables to custom MATlab step functions (Reinforcement Learning Toolbox)

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos