Borrar filtros
Borrar filtros

How to pass agent-irrelevant non-constant state variables to custom MATlab step functions (Reinforcement Learning Toolbox)

1 visualización (últimos 30 días)
Hi! I'm looking into passing non-constant variables to my step-function when creating an environment using function names (similiar to the cart pole example: https://de.mathworks.com/help/reinforcement-learning/ug/create-custom-reinforcement-learning-environment-in-matlab.html#CreateMATLABEnvironmentUsingCustomFunctionsExample-3).
  • I can't use the "function handles approach" as the variable values change from step to step.
  • My variables are irrelevant to the agent (the agent shall not consider them when approximating the value function).
Therefore, I have following questions:
Thanks in advance!
Marc

Respuestas (1)

Aditya
Aditya el 24 de En. de 2024
Hi Marc,
I unserstand that you are facing a problem related to reinforcement learning toolbox. The following point might help you out.
In reinforcement learning environments within MATLAB, the step function typically returns four outputs: `NextObs`, `Reward`, `IsDone`, and `LoggedSignals`. Here's a brief explanation of each:
1. NextObs: This represents the next observation that the agent will receive after taking an action. This is the state representation that the agent uses to make decisions and learn from. It's crucial for the agent's learning process as it directly affects the agent's understanding of the environment's dynamics.
2. Reward: This is the immediate reward received after taking an action. The agent uses this to evaluate how good the action was with respect to achieving the goal.
3. IsDone: This is a boolean flag indicating whether the episode has ended. This could be due to the task being completed, a failure state being reached, or the maximum number of steps being exceeded.
4. LoggedSignals: This is additional information that you might want to keep track of for debugging or analysis but is not used by the agent for learning. It's essentially a way to log any extra data that you want to monitor or pass through the simulation without affecting the agent's decision-making process.
Difference between `NextObs` and `LoggedSignals: `NextObs` is the actual state information that the agent uses to learn and make future decisions. `LoggedSignals` is additional information that you want to log or pass along but is not used by the agent for learning. In the cart pole example, they might look similar because the example is simple and doesn't require additional logging, but in more complex environments, `LoggedSignals` could include a variety of other data points.
Does the agent only consider the `NextObs` variable in training: Yes, the agent only considers the `NextObs` variable for training. The `LoggedSignals` are not used in the learning process and can be used to pass additional information through the simulation.
So, if you have variables that change from step to step and are not relevant for the agent's decision-making process, you can indeed pass them using the `LoggedSignals` output. This allows you to keep track of these variables without influencing the agent's learning algorithm. The agent's value function approximation will only be based on the observations (`NextObs`) and rewards (`Reward`) it receives.
Keep in mind that while `LoggedSignals` won't affect the agent's learning directly, you should ensure that the variables you pass through `LoggedSignals` do not inadvertently leak information about the environment that the agent should not have access to, as this could bias the agent's learning process in ways that might not generalize well to other situations or real-world applications.
  1 comentario
Marc
Marc el 24 de En. de 2024
Thank you Aditya for your answer!
This was what I expected and the code did behave the correct way. Great to get a confirmation from the community! I hope this clarification will help others facing similiar questions!
Kind regards.
Marc

Iniciar sesión para comentar.

Productos


Versión

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by