Create Custom Simulink Environments
To create a custom Simulink® environment, first create a Simulink environment model that represents the world as seen from the agent. Such a system is often referred to as plant or open loop system, while the whole (integrated) system that includes both agent and environment is often referred to as the closed loop system.
Your environment model must have an input signal, the action, which influences (through
some discrete, continuous or mixed dynamics) its next internal state and its outputs, which
are the observation, the reward and the is-done signals. The is-done signal is a scalar that
indicates the termination of an episode, causing the simulation to stop when its value is
true.
To prevent the waste of computational resources, update the observation, reward and is-done signals at the same execution rate used by the Agent block.
Note
A reinforcement learning environment is normally assumed to be strictly causal from the current action to the current observation. That is, it is assumed that the current observation does not depend on the current action (while the next state generally does). In other words, there must be no direct feedthrough between the current action and the current observation.
Note
The reward signal at time t must be the one corresponding to the transition between the observation output at time t-1 and the observation output at time t.
If your observation contains multiple channels, group the signals carried by the channels into a single observation bus. Similarly, for a hybrid environment, your action must be a two-element bus containing both the discrete (first) and the continuous (second) action channel. For more information about bus signals, see Simulink Bus Capabilities (Simulink).
For critical considerations on defining reward and observation signals in custom environments, see Define Observation and Reward Signals in Custom Environments.
Once you have created the Simulink model that represents the environment, you must add the RL Agent block to it. You can do so automatically or manually.
To automatically create a new closed-loop Simulink model that contains an RL Agent block and references your environment model from its Environment block, use the
createIntegratedEnvfunction, specifying the names of both your existing environment model and of the new, to be created, closed-loop model that contains the agent.You can specify as input arguments the names of the action, observation, is-done, and reward ports in your environment model. If your action or observation space is finite, you can also specify its possible values (otherwise the signals are assumed to be continuous).
This function returns an environment object as well as the block path of the agent and the environment observation and action specifications. For more information on model referencing, see Model Reference Behavior and Capabilities (Simulink).
To manually add the agent to your model, drag and drop the RL Agent block from the Reinforcement Learning Simulink library. Connect the action, observation, reward and is-done signals to the appropriate output and input ports of the block.
Unless you already have an agent object for this environment in the MATLAB® workspace, you must create specification objects for the action and observation signals using
rlNumericSpec(for continuous signals) orrlFiniteSetSpec(for discrete signals). For bus signals, create specifications usingbus2RLSpec.Once you connect the blocks, create an environment object using
rlSimulinkEnv, specifying the model filename, the block path to the RL Agent within the model, and the specification objects for the observation and the action channels, respectively. If your agent block already references an agent object in the MATLAB workspace, you do not need to supply the specification objects as input arguments.For an example, see Water Tank Custom Simulink Environment for Reinforcement Learning and Control Water Level in a Tank Using a DDPG Agent.
Both rlSimulinkEnv and
createIntegratedEnv
return a custom Simulink environment as a SimulinkEnvWithAgent
object. This environment object acts as an interface so that when you call sim or train, these
functions in turn call the (compiled) Simulink model associated with the object to generate experiences for the agents. You can
use this object to train and simulate agents in the same way as with any other
environment.
Note
Before training or simulating an agent within a Simulink environment, to make sure that the RL Agent block runs at
the intended sample time, set the SampleTime property of your agent
object appropriately.
You can also create a multiagent Simulink environment. To do so, create a Simulink model that has one action input and one set of outputs (observation, reward and
is-done) for every agent. Then manually add an agent block for each agent. Once you connect
the blocks, create an environment object using rlSimulinkEnv. Unless
each agent block already references an agent object in the MATLAB workspace, you must supply to rlSimulinkEnv two
cell arrays containing the observation action specification objects, respectively, as input
arguments. For an example, see Train Multiple Agents to Perform Collaborative Task.
Your environment can also include third-party functionality. For more information, see Integrate Components from External Tools (Simulink).
Algebraic Loops Between Environment and Agent
To avoid (potentially unsolvable) algebraic loops, you must avoid any direct feedthrough (that is, any direct dependency in the same time step) from the action to the observation output signal. This is because in the Simulink implementation of the agent block, the action at a given time step depends on the observation at the same time step. In other words, the agent block has a direct feedthrough from its observation input to its action output (similarly to an output feedback controller).
Avoiding a direct feedthrough from the action to the observation output signal is also in line with the fact that the standard formulation of a reinforcement learning environment as a Markov Decision Process is strictly causal from the current action to the current observation, since the current state does not depend on the current action (while the next state generally does).
However, note that for models created using createIntegratedEnv
the environment block is a referenced subsystem. Referenced subsystems are normally treated
as a direct feedthrough block (including the path from action to observation) unless the
Minimize artificial algebraic loop occurrences parameter
in the referenced subsystem is enabled. When the referenced subsystem
has no direct feedthrough from an input port that participates in an artificial algebraic
loop to any of its outputs ports, enabling this parameter can remove
artificial algebraic loops involving the referenced subsystem.
In general, adding a Delay (Simulink) or Memory (Simulink) block to the action signal between the agent block and environment block removes the algebraic loop. When you add an action delay, make sure that your reset function, which is called at the beginning of each training or simulation episode, initializes the delay to a feasible value.
Alternatively you can add delay blocks to all the environment output signals after the environment block. If you do so, make sure that your reset function initializes the delay to a feasible value which is also consistent with the initial state of the environment.
Note
In general, adding delays to solve algebraic loops should be done with extreme care, as it involves a modification of the loop dynamics.
If you have separate state and output functions (instead of a single step function), you can call them using separate MATLAB Function (Simulink) blocks, using a delay to represent the environment state. If you do so, your reset function only needs to initialize the state.
For more information on algebraic loops and how to remove some of them, see Algebraic Loop Concepts (Simulink) and Remove Algebraic Loops (Simulink). For a related example about using delays in a reinforcement learning loop implemented in Simulink, see Create and Simulate Same Environment in Both MATLAB and Simulink.
Reset Function for Simulink Environments
When you create a Simulink environment, you also typically create a custom reset
function that can set the parameters or initial state appropriately every time a simulation
starts to train or simulate the environment. For example, you can create a reset function
that randomizes certain block states such that each training episode begins from different
initial conditions.
To specify your reset function, assign the ResetFcn property of the
environment to a function handle or anonymous function handle. The function must have a
single Simulink.SimulationInput input argument and a single
Simulink.SimulationInput output argument. The output object specifies
temporary changes applied to model for the duration of the simulation of training episode.
For more information about Simulink simulation input objects, see Simulink.SimulationInput (Simulink).
For example, if you have an existing reset function myResetFunction
on the MATLAB path, you can set ResetFcn to an anonymous function that
in turn calls myResetFunction.
env.ResetFcn = @(in)myResetFunction(in);
Note that you can use the anonymous function to pass additional input arguments whose
value is available when the function is defined. For example, you can pass the additional
arguments arg1 and arg2 as follows.
env.ResetFcn = @(in)myResetFunction(in,arg1,arg2);
Here, when env.ResetFcn is created, its workspace include the values
of arg1 and arg2. These values persist within the
reset function workspace even if you clear the variables from the MATLAB workspace. When
env.ResetFcn is evaluated, it invokes
myResetFunction and passes it a copy of the values that
arg1 and arg2 had when the reset function was
defined. For more information, see Anonymous Functions.
If your reset behavior is simple, you can implement it without writing the separate
function myResetFunction. For example, the following code uses setVariable (Simulink) to
set the variable x0 to a random value in the model workspace, and return
the change in the Simulink.SimulationInput (Simulink) object
in.
env.ResetFcn = @(in) setVariable(in,'x0',rand());Here, the value of x0 that you specify overrides the existing
x0 value in the model workspace for the duration of the simulation or
training. The value of x0 is then reverted to the original when the
simulation or training completes.
If you call the reset function of a SimulinkEnvWithAgent object that has
its ResetFcn property empty, a
Simulink.SimulationInput for the unmodified Simulink model is returned.
The sim function calls
the reset function at the start of each simulation episode, and the train function
calls it at the start of each training episode. To make sure that the reset function always
uses the same random initial values when it is called the first time in your script, fix the
random stream generator before calling train or sim. For example,
fix the random number stream with seed 0 and random number algorithm
Mersenne Twister as
follows.
previousRngState = rng(0,"twister");
rng. For more information on result reproducibility, see Results Reproducibility.For more information on reset functions for Simulink environments, see the ResetFcn property of SimulinkEnvWithAgent.
For examples, see rlSimulinkEnv,
Water Tank Custom Simulink Environment for Reinforcement Learning, and Train Multiple Agents to Perform Collaborative Task.
See Also
Functions
rlSimulinkEnv|createIntegratedEnv|setVariable(Simulink) |train|sim
Objects
SimulinkEnvWithAgent|Simulink.SimulationInput(Simulink)
Topics
- Water Tank Custom Simulink Environment for Reinforcement Learning
- Control Water Level in a Tank Using a DDPG Agent
- Create and Simulate Same Environment in Both MATLAB and Simulink
- Algebraic Loop Concepts (Simulink)
- Reinforcement Learning Environments
- Load Simulink Environments in Reinforcement Learning Designer
- Create and Simulate Same Environment in Both MATLAB and Simulink