rlAdditiveNoisePolicy

Policy object to generate continuous noisy actions for custom training loops

Since R2022a

Description

This object implements an additive noise policy, which returns continuous deterministic actions with added noise, given an input observation. You can create an rlAdditiveNoisePolicy object from an rlContinuousDeterministicActor or extract it from an rlDDPGAgent or rlTD3Agent. You can then train the policy object using a custom training loop. If UseNoisyAction is set to 0 the policy does not explore. This object is not compatible with generatePolicyBlock and generatePolicyFunction. For more information on policies and value functions, see Create Policies and Value Functions.

Creation

Syntax

policy = rlAdditiveNoisePolicy(actor)

policy = rlAdditiveNoisePolicy(actor,NoiseType=noiseType)

Description

example

policy = rlAdditiveNoisePolicy(actor) creates the additive noise policy object policy from the continuous deterministic actor actor. It also sets the Actor property of policy to the input argument actor.

example

policy = rlAdditiveNoisePolicy(actor,NoiseType=noiseType) specifies the type of noise distribution for the policy. noiseType can be either "gaussian" (Gaussian noise) or "ou" (Ornstein-Uhlenbeck noise). This syntax also sets the NoiseType property of policy to the input argument noiseType.

Properties

expand all

`Actor` — Continuous deterministic actor
`rlContinuousDeterministicActor` object

Continuous deterministic actor, specified as an rlContinuousDeterministicActor object.

`NoiseType` — Noise type
`"gaussian"` (default) | `"ou"`

Noise type, specified as either "gaussian" (default, Gaussian noise) or "ou" (Ornstein-Uhlenbeck noise). For more information on noise models, see Noise Models.

Example: "ou"

`NoiseOptions` — Noise model options
`GaussianActionNoise` object (default) | `OrnsteinUhlenbeckActionNoise` object

Noise model options, specified as a GaussianActionNoise object or an OrnsteinUhlenbeckActionNoise object. Changing the noise state or any noise option of an rlAdditiveNoisePolicy object deployed through code generation is not supported.

For more information on noise models, see Noise Models.

`EnableNoiseDecay` — Option to enable noise decay
`true` (default) | `false`

Option to enable noise decay, specified as a logical value: either true (default, enabling noise decay) or false (disabling noise decay).

Example: false

`UseNoisyAction` — Option to enable noisy action
`true` (default) | `false`

Option to enable noisy actions, specified as a logical value: either true (default, adding noise to actions, which helps exploration) or false (no noise is added to the actions). When noise is not added to the actions the policy is deterministic and therefore it does not explore.

Example: false

`ObservationInfo` — Observation specifications
`rlFiniteSetSpec` object | `rlNumericSpec` object | array

Observation specifications, specified as an rlFiniteSetSpec or rlNumericSpec object or an array of such objects. These objects define properties such as the dimensions, data types, and names of the observation channels.

`ActionInfo` — Action specifications
`rlNumericSpec` object

Action specifications, specified as an rlNumericSpec object. This object defines the properties of the environment action channel, such as its dimensions, data type, and name.

Note

Only one action channel is allowed.

`SampleTime` — Sample time of policy
positive scalar | `-1` (default)

Sample time of the policy, specified as a positive scalar or as -1 (default). Setting this parameter to -1 allows for event-based simulations.

Within a Simulink^® environment, the RL Agent block in which the policy is specified executes every SampleTime seconds of simulation time. If SampleTime is -1, the block inherits the sample time from its parent subsystem.

Within a MATLAB^® environment, the policy is executed every time the environment advances. In this case, SampleTime is the time interval between consecutive elements in the output experience. If SampleTime is -1, the sample time is treated as being equal to 1.

Example: 0.2

Object Functions

`getAction`	Obtain action from agent, actor, or policy object given environment observations
`getLearnableParameters`	Obtain learnable parameter values from agent, function approximator, or policy object
`reset`	Reset environment, agent, experience buffer, or policy object
`setLearnableParameters`	Set learnable parameter values of agent, function approximator, or policy object

Examples

collapse all

Create Additive Noise Policy from Continuous Deterministic Actor

Open Live Script

Create observation and action specification objects. For this example, define the observation and action spaces as continuous four- and two-dimensional spaces, respectively.

obsInfo = rlNumericSpec([4 1]);
actInfo = rlNumericSpec([2 1]);

Alternatively, use getObservationInfo and getActionInfo to extract the specification objects from an environment.

Create a continuous deterministic actor. This actor must accept an observation as input and return an action as output.

To approximate the policy function within the actor, use a deep neural network model. Define the network as an array of layer objects, and get the dimension of the observation and action spaces from the environment specification objects.

layers = [ 
    featureInputLayer(obsInfo.Dimension(1))
    fullyConnectedLayer(16)
    reluLayer
    fullyConnectedLayer(actInfo.Dimension(1)) 
    ];

Convert the network to a dlnetwork object and display the number of weights.

model = dlnetwork(layers);
summary(model)

   Initialized: true

   Number of learnables: 114

   Inputs:
      1   'input'   4 features

Create the actor using model, and the observation and action specifications.

actor = rlContinuousDeterministicActor(model,obsInfo,actInfo)

actor = 
  rlContinuousDeterministicActor with properties:

    ObservationInfo: [1x1 rl.util.rlNumericSpec]
         ActionInfo: [1x1 rl.util.rlNumericSpec]
      Normalization: "none"
          UseDevice: "cpu"
         Learnables: {4x1 cell}
              State: {0x1 cell}

Check the actor with a random observation input.

act = getAction(actor,{rand(obsInfo.Dimension)});
act{1}

ans = 2x1 single column vector

    0.4013
    0.0578

Create a policy object from actor.

policy = rlAdditiveNoisePolicy(actor)

policy = 
  rlAdditiveNoisePolicy with properties:

               Actor: [1x1 rl.function.rlContinuousDeterministicActor]
           NoiseType: "gaussian"
        NoiseOptions: [1x1 rl.option.GaussianActionNoise]
    EnableNoiseDecay: 1
       Normalization: "none"
      UseNoisyAction: 1
     ObservationInfo: [1x1 rl.util.rlNumericSpec]
          ActionInfo: [1x1 rl.util.rlNumericSpec]
          SampleTime: -1

You can access the policy options using dot notation. For example, change the upper and lower limits of the distribution.

policy.NoiseOptions.LowerLimit = -3;
policy.NoiseOptions.UpperLimit = 3;

Check the policy with a random observation input.

act = getAction(policy,{rand(obsInfo.Dimension)});
act{1}

ans = 2×1

    0.1878
   -0.1645

You can now train the policy with a custom training loop and then deploy it to your application.

Create Additive Noise Policy Specifying Noise Model

Open Live Script

Create observation and action specification objects. For this example, define the observation and action spaces as continuous three- and one-dimensional spaces, respectively.

obsInfo = rlNumericSpec([3 1]);
actInfo = rlNumericSpec([1 1]);

Alternatively, use getObservationInfo and getActionInfo to extract the specification objects from an environment

Create a continuous deterministic actor. This actor must accept an observation as input and return an action as output.

layers = [ 
    featureInputLayer(obsInfo.Dimension(1))
    fullyConnectedLayer(9)
    reluLayer
    fullyConnectedLayer(actInfo.Dimension(1)) 
    ];

Convert the network to a dlnetwork object and display the number of weights.

model = dlnetwork(layers);
summary(model)

   Initialized: true

   Number of learnables: 46

   Inputs:
      1   'input'   3 features

Create the actor using model, and the observation and action specifications.

actor = rlContinuousDeterministicActor(model,obsInfo,actInfo)

actor = 
  rlContinuousDeterministicActor with properties:

    ObservationInfo: [1x1 rl.util.rlNumericSpec]
         ActionInfo: [1x1 rl.util.rlNumericSpec]
      Normalization: "none"
          UseDevice: "cpu"
         Learnables: {4x1 cell}
              State: {0x1 cell}

Check the actor with a random observation input.

act = getAction(actor,{rand(obsInfo.Dimension)});
act{1}

ans = single
    -0.2535

Create a policy object from actor, specifying an Ornstein-Uhlenbeck probability distribution for the noise.

policy = rlAdditiveNoisePolicy(actor,NoiseType="ou")

policy = 
  rlAdditiveNoisePolicy with properties:

               Actor: [1x1 rl.function.rlContinuousDeterministicActor]
           NoiseType: "ou"
        NoiseOptions: [1x1 rl.option.OrnsteinUhlenbeckActionNoise]
    EnableNoiseDecay: 1
       Normalization: "none"
      UseNoisyAction: 1
     ObservationInfo: [1x1 rl.util.rlNumericSpec]
          ActionInfo: [1x1 rl.util.rlNumericSpec]
          SampleTime: -1

You can access the policy options using dot notation. For example, change the standard deviation of the distribution.

policy.NoiseOptions.StandardDeviation = 0.6;

Check the policy with a random observation input.

act = getAction(policy,{rand(obsInfo.Dimension)});
act{1}

ans = -0.1625

You can now train the policy with a custom training loop and then deploy it to your application.

Version History

Introduced in R2022a

rlAdditiveNoisePolicy

Description

Creation

Syntax

Description

Properties

`Actor` — Continuous deterministic actor
`rlContinuousDeterministicActor` object

`NoiseType` — Noise type
`"gaussian"` (default) | `"ou"`

`NoiseOptions` — Noise model options
`GaussianActionNoise` object (default) | `OrnsteinUhlenbeckActionNoise` object

`EnableNoiseDecay` — Option to enable noise decay
`true` (default) | `false`

`UseNoisyAction` — Option to enable noisy action
`true` (default) | `false`

`ObservationInfo` — Observation specifications
`rlFiniteSetSpec` object | `rlNumericSpec` object | array

`ActionInfo` — Action specifications
`rlNumericSpec` object

`SampleTime` — Sample time of policy
positive scalar | `-1` (default)

Object Functions

Examples

Create Additive Noise Policy from Continuous Deterministic Actor

Create Additive Noise Policy Specifying Noise Model

Version History

See Also

Functions

Objects

Blocks

Topics

rlAdditiveNoisePolicy

Description

Creation

Syntax

Description

Properties

Actor — Continuous deterministic actor rlContinuousDeterministicActor object

NoiseType — Noise type "gaussian" (default) | "ou"

NoiseOptions — Noise model options GaussianActionNoise object (default) | OrnsteinUhlenbeckActionNoise object

EnableNoiseDecay — Option to enable noise decay true (default) | false

UseNoisyAction — Option to enable noisy action true (default) | false

ObservationInfo — Observation specifications rlFiniteSetSpec object | rlNumericSpec object | array

ActionInfo — Action specifications rlNumericSpec object

SampleTime — Sample time of policy positive scalar | -1 (default)

Object Functions

Examples

Create Additive Noise Policy from Continuous Deterministic Actor

Create Additive Noise Policy Specifying Noise Model

Version History

See Also

Functions

Objects

Blocks

Topics

`Actor` — Continuous deterministic actor
`rlContinuousDeterministicActor` object

`NoiseType` — Noise type
`"gaussian"` (default) | `"ou"`

`NoiseOptions` — Noise model options
`GaussianActionNoise` object (default) | `OrnsteinUhlenbeckActionNoise` object

`EnableNoiseDecay` — Option to enable noise decay
`true` (default) | `false`

`UseNoisyAction` — Option to enable noisy action
`true` (default) | `false`

`ObservationInfo` — Observation specifications
`rlFiniteSetSpec` object | `rlNumericSpec` object | array

`ActionInfo` — Action specifications
`rlNumericSpec` object

`SampleTime` — Sample time of policy
positive scalar | `-1` (default)