Main Content


Reinforcement learning environment with a dynamic model implemented in Simulink


The SimulinkEnvWithAgent object represents a reinforcement learning environment that uses a dynamic model implemented in Simulink®. The environment object acts as an interface such that when you call sim or train, these functions in turn call the Simulink model to generate experiences for the agents.


To create a SimulinkEnvWithAgent object, use one of the following functions.

  • rlSimulinkEnv — Create an environment using a Simulink model with at least one RL Agent block.

  • createIntegratedEnv — Use a reference model as a reinforcement learning environment.

  • rlPredefinedEnv — Create a predefined reinforcement learning environment.


expand all

Simulink model name, specified as a string or character vector. The specified model must contain one or more RL Agent blocks.

Agent block paths, specified as a string or string array.

If Model contains a single RL Agent block for training, then AgentBlock is a string containing the block path.

If Model contains multiple RL Agent blocks for training, then AgentBlock is a string array, where each element contains the path of one agent block.

Model can contain RL Agent blocks whose path is not included in AgentBlock. Such agent blocks behave as part of the environment and select actions based on their current policies. When you call sim or train, the experiences of these agents are not returned and their policies are not updated.

The agent blocks can be inside of a model reference. For more information on configuring an agent block for reinforcement learning, see RL Agent.

Reset behavior for the environment, specified as a function handle or anonymous function handle. The function must have a single Simulink.SimulationInput input argument and a single Simulink.SimulationInput output argument.

The reset function sets the initial state of the Simulink environment. For example, you can create a reset function that randomizes certain block states such that each training episode begins from different initial conditions.

If you have an existing reset function myResetFunction on the MATLAB® path, set ResetFcn using a handle to the function.

env.ResetFcn = @(in)myResetFunction(in);

If your reset behavior is simple, you can implement it using an anonymous function handle. For example, the following code sets the variable x0 to a random value.

env.ResetFcn = @(in) setVariable(in,'x0',rand());

The sim function calls the reset function to reset the environment at the start of each simulation, and the train function calls it at the start of each training episode.

Option to toggle fast restart, specified as either "on" or "off". Fast restart allows you to perform iterative simulations without compiling a model or terminating the simulation each time.

For more information on fast restart, see How Fast Restart Improves Iterative Simulations (Simulink).

Object Functions

trainTrain reinforcement learning agents within a specified environment
simSimulate trained reinforcement learning agents within specified environment
getObservationInfoObtain observation data specifications from reinforcement learning environment, agent, or experience buffer
getActionInfoObtain action data specifications from reinforcement learning environment, agent, or experience buffer


collapse all

Create a Simulink environment using the trained agent and corresponding Simulink model from the Control Water Level in a Tank Using a DDPG Agent example.

Load the agent in the MATLAB® workspace.

load rlWaterTankDDPGAgent

Create an environment for the rlwatertank model, which contains an RL Agent block. Since the agent used by the block is already in the workspace, you do not need to pass the observation and action specifications to create the environment.

env = rlSimulinkEnv("rlwatertank","rlwatertank/RL Agent")
env = 
SimulinkEnvWithAgent with properties:

           Model : rlwatertank
      AgentBlock : rlwatertank/RL Agent
        ResetFcn : []
  UseFastRestart : on

Validate the environment by performing a short simulation for two sample times.


You can now train and simulate the agent within the environment by using train and sim, respectively.

For this example, consider the rlSimplePendulumModel Simulink® model. The model is a simple frictionless pendulum that initially hangs in a downward position.

Open the model.

mdl = "rlSimplePendulumModel";

Create rlNumericSpec and rlFiniteSetSpec objects for the observation and action specifications, respectively.

The observation is a vector containing three signals: the sine, cosine, and time derivative of the angle.

obsInfo = rlNumericSpec([3 1]) 
obsInfo = 
  rlNumericSpec with properties:

     LowerLimit: -Inf
     UpperLimit: Inf
           Name: [0x0 string]
    Description: [0x0 string]
      Dimension: [3 1]
       DataType: "double"

The action is a scalar expressing the torque and can be one of three possible values, -2 Nm, 0 Nm and 2 Nm.

actInfo = rlFiniteSetSpec([-2 0 2])
actInfo = 
  rlFiniteSetSpec with properties:

       Elements: [3x1 double]
           Name: [0x0 string]
    Description: [0x0 string]
      Dimension: [1 1]
       DataType: "double"

You can use dot notation to assign property values for the rlNumericSpec and rlFiniteSetSpec objects.

obsInfo.Name = "observations";
actInfo.Name = "torque";

Assign the agent block path information, and create the reinforcement learning environment for the Simulink model using the information extracted in the previous steps.

agentBlk = mdl + "/RL Agent";
env = rlSimulinkEnv(mdl,agentBlk,obsInfo,actInfo)
env = 
SimulinkEnvWithAgent with properties:

           Model : rlSimplePendulumModel
      AgentBlock : rlSimplePendulumModel/RL Agent
        ResetFcn : []
  UseFastRestart : on

You can also specify a reset function using dot notation. For this example, randomly initialize theta0 in the model workspace.

env.ResetFcn = @(in) setVariable(in,"theta0",randn,"Workspace",mdl)
env = 
SimulinkEnvWithAgent with properties:

           Model : rlSimplePendulumModel
      AgentBlock : rlSimplePendulumModel/RL Agent
        ResetFcn : @(in)setVariable(in,"theta0",randn,"Workspace",mdl)
  UseFastRestart : on

Create an environment for the Simulink model from the example Train Multiple Agents to Perform Collaborative Task.

Load the file containing the agents. For this example, load the agents that have been already trained using decentralized learning.

load decentralizedAgents.mat

Create an environment for the rlCollaborativeTask model, which has two agent blocks. Since the agents used by the two blocks (agentA and agentB) are already in the workspace, you do not need to pass their observation and action specifications to create the environment.

env = rlSimulinkEnv( ...
    "rlCollaborativeTask", ...
    ["rlCollaborativeTask/Agent A","rlCollaborativeTask/Agent B"])
env = 
SimulinkEnvWithAgent with properties:

           Model : rlCollaborativeTask
      AgentBlock : [
                     rlCollaborativeTask/Agent A
                     rlCollaborativeTask/Agent B
        ResetFcn : []
  UseFastRestart : on

It is good practice to specify a reset function for the environment such that agents start from random initial positions at the beginning of each episode. For an example, see the resetRobots function defined in Train Multiple Agents to Perform Collaborative Task.

You can now simulate or train the agents within the environment using sim or train, respectively.

Use the predefined "SimplePendulumModel-Continuous" keyword to create a continuous simple pendulum model reinforcement learning environment.

env = rlPredefinedEnv("SimplePendulumModel-Continuous")
env = 
SimulinkEnvWithAgent with properties:

           Model : rlSimplePendulumModel
      AgentBlock : rlSimplePendulumModel/RL Agent
        ResetFcn : []
  UseFastRestart : on

This example shows how to use createIntegratedEnv to create an environment object starting from a Simulink model that implements the system with which the agent will interact, and that does not have an agent block. Such a system is often referred to as plant, open-loop system, or reference system, while the whole (integrated) system that includes the agent is often referred to as the closed-loop system.

For this example, use the flying robot model described in Train DDPG Agent to Control Sliding Robot as the reference (open-loop) system.

Open the flying robot model.


Initialize the state variables and sample time.

% initial model state variables
theta0 = 0;
x0 = -15;
y0 = 0;

% sample time
Ts = 0.4;

Create the Simulink model myIntegratedEnv containing the flying robot model connected in a closed loop to the agent block. The function also returns the reinforcement learning environment object env to be used for training.

env = createIntegratedEnv( ...
    "rlFlyingRobotEnv", ...
env = 
SimulinkEnvWithAgent with properties:

           Model : myIntegratedEn
      AgentBlock : myIntegratedEn/RL Agent
        ResetFcn : []
  UseFastRestart : on

The function can also return the block path to the RL Agent block in the new integrated model, as well as the observation and action specifications for the reference model.

[~,agentBlk,observationInfo,actionInfo] = ...
    createIntegratedEnv( ...
agentBlk = 
"myIntegratedEnv/RL Agent"
observationInfo = 
  rlNumericSpec with properties:

     LowerLimit: -Inf
     UpperLimit: Inf
           Name: "observation"
    Description: [0x0 string]
      Dimension: [7 1]
       DataType: "double"

actionInfo = 
  rlNumericSpec with properties:

     LowerLimit: -Inf
     UpperLimit: Inf
           Name: "action"
    Description: [0x0 string]
      Dimension: [2 1]
       DataType: "double"

Returning the block path and specifications is useful in cases in which you need to modify descriptions, limits, or names in observationInfo and actionInfo. After modifying the specifications, you can then create an environment from the integrated model IntegratedEnv using the rlSimulinkEnv function.

Version History

Introduced in R2019a

Go to top of page