rlDDPGAgentOptions

Create options for DDPG agent

Syntax

opt = rlDDPGAgentOptions
opt = rlDDPGAgentOptions(Name,Value)

Description

example

opt = rlDDPGAgentOptions creates an rlDDPGAgentOptions object for use as an argument when creating a DDPG agent using all default options. You can modify the object properties using dot notation.

opt = rlDDPGAgentOptions(Name,Value) creates a DDPG options object using the specified name-value pairs to override default property values.

Examples

collapse all

Create an rlDDPGAgentOptions object that specifies the mini-batch size.

opt = rlDDPGAgentOptions('MiniBatchSize',48)
opt = 

  rlDDPGAgentOptions with properties:

                           NoiseOptions: [1×1 rl.option.OrnsteinUhlenbeckActionNoise]
                     TargetSmoothFactor: 1.0000e-03
                  TargetUpdateFrequency: 4
                     TargetUpdateMethod: "smoothing"
    ResetExperienceBufferBeforeTraining: 1
          SaveExperienceBufferWithAgent: 0
                          MiniBatchSize: 48
                    NumStepsToLookAhead: 1
                 ExperienceBufferLength: 10000
                             SampleTime: 1
                         DiscountFactor: 0.9900

You can modify options using dot notation. For example, set the agent sample time to 0.5.

opt.SampleTime = 0.5;

Input Arguments

collapse all

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: "MiniBatchSize",24

Options for noise, specified as the comma-separated pair consisting of 'NoiseOptions' and an OrnsteinUhlenbeckActionNoise object, with the following numeric value properties.

PropertyDescription
InitialActionInitial value of action for noise model
MeanNoise model mean
MeanAttractionConstantConstant specifying how quickly the noise model output is attracted to the mean
VarianceNoise model variance
VarianceDecayRateDecay rate of the variance
SampleTimeSample time of the noise model update

At each sample time step, the noise model is updated using the following formula:

x(k) = x(k-1) + MeanAttractionConstant.*(Mean - x(k-1)).*SampleTime
       + Variance.*randn(size(Mean)).*sqrt(SampleTime)

To specify noise options, use dot notation after creating the rlDDPGAgentOptions object. For example, set the noise mean to 0.5.

opt = rlDDPGAgentOptions;
opt.NoiseModel.Mean = 0.5;

For continuous action signals, it is important to set the noise variance correctly. For example, when the steering angle is bounded by [-0.26,0.26] radians, if you set the noise variance to 0.5, then the agent will not learn anything.

If your agent converges on local optima too quickly, promote agent exploration by increasing the amount of noise.

Smoothing factor for target actor and critic updates, specified as the comma-separated pair consisting of 'TargetSmoothFactor' and a double. The smoothing factor determines how the target properties are updated when TargetUpdateMethod is "smoothing".

Number of episodes between target actor and critic updates, specified as the comma-separated pair consisting of 'TargetUpdateFrequency' and a numeric integer value. This option applies only when TargetUpdateMethod is "periodic".

Strategy for updating target actor and critic properties using values from the trained actor and critic, specified as the comma-separated pair consisting of 'TargetUpdateMethod' and one of the following:

  • "smoothing" — Update the target actor and critic properties, thetaTarget, at every training episode according to the following formula, where theta contains the current trained network properties.

    thetaTarget = TargetSmoothFactor*theta + (1 - TargetSmoothFactor)*thetaTarget
  • "periodic" — Update the target actor and critic properties every TargetUpdateFrequency training episodes.

Flag for clearing the experience buffer before training, specified as the comma-separated pair consisting of 'ResetExperienceBufferBeforeTraining' and a logical true or false.

Flag for saving the experience buffer data when saving the agent, specified as the comma-separated pair consisting of 'SaveExperienceBufferWithAgent' and a logical true or false. This option applies both when saving candidate agents during training and when saving agents using the save function.

For some agents, such as those with a large experience buffer and image-based observations, the memory required for saving their experience buffer is large. In such cases, to not save the experience buffer data, set SaveExperienceBufferWithAgent to false.

If you plan to further train your saved agent, you can start training with the previous experience buffer as a starting point. In this case, set SaveExperienceBufferWithAgent to true.

Size of random experience mini-batch, specified as the comma-separated pair consisting of 'MiniBatchSize' and a positive numeric value. During each training episode, the agent randomly samples experiences from the experience buffer when computing gradients for updating the critic properties. Large mini-batches reduce the variance when computing gradients but increase the computational effort.

Number of steps to look ahead during training, specified as the comma-separated pair consisting of 'NumStepsToLookAhead' and a numeric positive integer value.

Experience buffer size, specified as the comma-separated pair consisting of 'ExperienceBufferLength' and a numeric positive integer value. During training, the agent updates the actor and critic using a mini-batch of experiences randomly sampled from the buffer.

Sample time of agent, specified as the comma-separated pair consisting of 'SampleTime' and a numeric value.

Discount factor applied to future rewards during training, specified as the comma-separated pair consisting of 'DiscountFactor' and a positive numeric value less than or equal to 1.

Output Arguments

collapse all

DDPG agent options, returned as an rlDDPGAgentOptions object. The object properties are described in Name-Value Pair Arguments.

Introduced in R2019a