Create Custom Simulink Environments

To create a custom Simulink^® environment, first create a Simulink environment model that represents the world as seen from the agent. Such a system is often referred to as plant or open loop system, while the whole (integrated) system that includes both agent and environment is often referred to as the closed loop system.

Your environment model must have an input signal, the action, which influences (through some discrete, continuous or mixed dynamics) its next internal state and its outputs, which are the observation, the reward and the is-done signals. The is-done signal is a scalar that indicates the termination of an episode, causing the simulation to stop when its value is true.

To prevent the waste of computational resources, update the observation, reward and is-done signals at the same execution rate used by the Agent block.

Note

A reinforcement learning environment is normally assumed to be strictly causal from the current action to the current observation. That is, it is assumed that the current observation does not depend on the current action (while the next state generally does). In other words, there must be no direct feedthrough between the current action and the current observation.

Note

The reward signal at time t must be the one corresponding to the transition between the observation output at time t-1 and the observation output at time t.

If your observation contains multiple channels, group the signals carried by the channels into a single observation bus. Similarly, for a hybrid environment, your action must be a two-element bus containing both the discrete (first) and the continuous (second) action channel. For more information about bus signals, see Simulink Bus Capabilities (Simulink).

For critical considerations on defining reward and observation signals in custom environments, see Define Observation and Reward Signals in Custom Environments.

Once you have created the Simulink model that represents the environment, you must add the RL Agent block to it. You can do so automatically or manually.

To automatically create a new closed-loop Simulink model that contains an RL Agent block and references your environment model from its Environment block, use the createIntegratedEnv function, specifying the names of both your existing environment model and of the new, to be created, closed-loop model that contains the agent.
You can specify as input arguments the names of the action, observation, is-done, and reward ports in your environment model. If your action or observation space is finite, you can also specify its possible values (otherwise the signals are assumed to be continuous).
This function returns an environment object as well as the block path of the agent and the environment observation and action specifications. For more information on model referencing, see Model Reference Behavior and Capabilities (Simulink).
To manually add the agent to your model, drag and drop the RL Agent block from the Reinforcement Learning Simulink library. Connect the action, observation, reward and is-done signals to the appropriate output and input ports of the block.
Unless you already have an agent object for this environment in the MATLAB^® workspace, you must create specification objects for the action and observation signals using rlNumericSpec (for continuous signals) or rlFiniteSetSpec (for discrete signals). For bus signals, create specifications using bus2RLSpec.
Once you connect the blocks, create an environment object using rlSimulinkEnv, specifying the model filename, the block path to the RL Agent within the model, and the specification objects for the observation and the action channels, respectively. If your agent block already references an agent object in the MATLAB workspace, you do not need to supply the specification objects as input arguments.
For an example, see Water Tank Custom Simulink Environment for Reinforcement Learning and Control Water Level in a Tank Using a DDPG Agent.

Both rlSimulinkEnv and createIntegratedEnv return a custom Simulink environment as a SimulinkEnvWithAgent object. This environment object acts as an interface so that when you call sim or train, these functions in turn call the (compiled) Simulink model associated with the object to generate experiences for the agents. You can use this object to train and simulate agents in the same way as with any other environment.

Note

Before training or simulating an agent within a Simulink environment, to make sure that the RL Agent block runs at the intended sample time, set the SampleTime property of your agent object appropriately.

You can also create a multiagent Simulink environment. To do so, create a Simulink model that has one action input and one set of outputs (observation, reward and is-done) for every agent. Then manually add an agent block for each agent. Once you connect the blocks, create an environment object using rlSimulinkEnv. Unless each agent block already references an agent object in the MATLAB workspace, you must supply to rlSimulinkEnv two cell arrays containing the observation action specification objects, respectively, as input arguments. For an example, see Train Multiple Agents to Perform Collaborative Task.

Your environment can also include third-party functionality. For more information, see Integrate Components from External Tools (Simulink).

Algebraic Loops Between Environment and Agent

To avoid (potentially unsolvable) algebraic loops, you must avoid any direct feedthrough (that is, any direct dependency in the same time step) from the action to the observation output signal. This is because in the Simulink implementation of the agent block, the action at a given time step depends on the observation at the same time step. In other words, the agent block has a direct feedthrough from its observation input to its action output (similarly to an output feedback controller).

Avoiding a direct feedthrough from the action to the observation output signal is also in line with the fact that the standard formulation of a reinforcement learning environment as a Markov Decision Process is strictly causal from the current action to the current observation, since the current state does not depend on the current action (while the next state generally does).

However, note that for models created using createIntegratedEnv the environment block is a referenced subsystem. Referenced subsystems are normally treated as a direct feedthrough block (including the path from action to observation) unless the Minimize artificial algebraic loop occurrences parameter in the referenced subsystem is enabled. When the referenced subsystem has no direct feedthrough from an input port that participates in an artificial algebraic loop to any of its outputs ports, enabling this parameter can remove artificial algebraic loops involving the referenced subsystem.

In general, adding a Delay (Simulink) or Memory (Simulink) block to the action signal between the agent block and environment block removes the algebraic loop. When you add an action delay, make sure that your reset function, which is called at the beginning of each training or simulation episode, initializes the delay to a feasible value.

Alternatively you can add delay blocks to all the environment output signals after the environment block. If you do so, make sure that your reset function initializes the delay to a feasible value which is also consistent with the initial state of the environment.

Note

In general, adding delays to solve algebraic loops should be done with extreme care, as it involves a modification of the loop dynamics.

If you have separate state and output functions (instead of a single step function), you can call them using separate MATLAB Function (Simulink) blocks, using a delay to represent the environment state. If you do so, your reset function only needs to initialize the state.

For more information on algebraic loops and how to remove some of them, see Algebraic Loop Concepts (Simulink) and Remove Algebraic Loops (Simulink). For a related example about using delays in a reinforcement learning loop implemented in Simulink, see Create and Simulate Same Environment in Both MATLAB and Simulink.

Reset Function for Simulink Environments

When you create a Simulink environment, you also typically create a custom reset function that can set the parameters or initial state appropriately every time a simulation starts to train or simulate the environment. For example, you can create a reset function that randomizes certain block states such that each training episode begins from different initial conditions.

To specify your reset function, assign the ResetFcn property of the environment to a function handle or anonymous function handle. The function must have a single Simulink.SimulationInput input argument and a single Simulink.SimulationInput output argument. The output object specifies temporary changes applied to model for the duration of the simulation of training episode. For more information about Simulink simulation input objects, see Simulink.SimulationInput (Simulink).

For example, if you have an existing reset function myResetFunction on the MATLAB path, you can set ResetFcn to an anonymous function that in turn calls myResetFunction.

env.ResetFcn = @(in)myResetFunction(in);

Note that you can use the anonymous function to pass additional input arguments whose value is available when the function is defined. For example, you can pass the additional arguments arg1 and arg2 as follows.

env.ResetFcn = @(in)myResetFunction(in,arg1,arg2);

Here, when env.ResetFcn is created, its workspace include the values of arg1 and arg2. These values persist within the reset function workspace even if you clear the variables from the MATLAB workspace. When env.ResetFcn is evaluated, it invokes myResetFunction and passes it a copy of the values that arg1 and arg2 had when the reset function was defined. For more information, see Anonymous Functions.

If your reset behavior is simple, you can implement it without writing the separate function myResetFunction. For example, the following code uses setVariable (Simulink) to set the variable x0 to a random value in the model workspace, and return the change in the Simulink.SimulationInput (Simulink) object in.

env.ResetFcn = @(in) setVariable(in,'x0',rand());

Here, the value of x0 that you specify overrides the existing x0 value in the model workspace for the duration of the simulation or training. The value of x0 is then reverted to the original when the simulation or training completes.

If you call the reset function of a SimulinkEnvWithAgent object that has its ResetFcn property empty, a Simulink.SimulationInput for the unmodified Simulink model is returned.

The sim function calls the reset function at the start of each simulation episode, and the train function calls it at the start of each training episode. To make sure that the reset function always uses the same random initial values when it is called the first time in your script, fix the random stream generator before calling train or sim. For example, fix the random number stream with seed 0 and random number algorithm Mersenne Twister as follows.

previousRngState = rng(0,"twister");

For more information on controlling the seed used for random number generation, see rng. For more information on result reproducibility, see Results Reproducibility.

For more information on reset functions for Simulink environments, see the ResetFcn property of SimulinkEnvWithAgent. For examples, see rlSimulinkEnv, Water Tank Custom Simulink Environment for Reinforcement Learning, and Train Multiple Agents to Perform Collaborative Task.

Create Custom Simulink Environments

Algebraic Loops Between Environment and Agent

Reset Function for Simulink Environments

See Also

Functions

Objects

Topics