Documentation

# rlPGAgentOptions

Create options for PG agent

## Syntax

``opt = rlPGAgentOptions``
``opt = rlPGAgentOptions(Name,Value)``

## Description

example

````opt = rlPGAgentOptions` creates an `rlPGAgentOptions` object for use as an argument when creating a PG agent using all default settings. You can modify the object properties using dot notation.`opt = rlPGAgentOptions(Name,Value)` creates a PG options object using the specified name-value pairs to override default property values.```

## Examples

collapse all

Create a PG agent options object, specifying the discount factor.

`opt = rlPGAgentOptions('DiscountFactor',0.9)`
```opt = rlPGAgentOptions with properties: UseBaseline: 1 EntropyLossWeight: 0 SampleTime: 1 DiscountFactor: 0.9000```

You can modify options using dot notation. For example, set the agent sample time to `0.5`.

`opt.SampleTime = 0.5;`

## Input Arguments

collapse all

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `"DiscountFactor",0.95`

Instruction to use baseline for learning, specified as the comma-separated pair consisting of `'UseBaseline'` and logical `true` or `false`. When`UseBaseline` is true, you must specify a critic network as the baseline function approximator.

In general, for simpler problems with smaller actor networks, PG agents work better without a baseline.

Sample time of agent, specified as the comma-separated pair consisting of `'SampleTime'` and a numeric value.

Discount factor applied to future rewards during training, specified as the comma-separated pair consisting of `'DiscountFactor'` and a positive numeric value less than or equal to 1.

Entropy loss weight, specified as the comma-separated pair consisting of `'EntropyLossWeight'` and a scalar value between `0` and `1`. A higher loss weight value promotes agent exploration by applying a penalty for being too certain about which action to take. Doing so can help the agent move out of local optima.

The entropy loss function for episode step t is:

`${H}_{t}=E\sum _{k=1}^{M}{\mu }_{k}\left({S}_{t}|{\theta }_{\mu }\right)\mathrm{ln}{\mu }_{k}\left({S}_{t}|{\theta }_{\mu }\right)$`

Here:

• E is the entropy loss weight.

• M is the number of possible actions.

• μk(St) is the probability of taking action Ak following the current policy.

When gradients are computed during training, an additional gradient component is computed for minimizing this loss function.

## Output Arguments

collapse all

PG agent options, returned as an `rlPGAgentOptions` object. The object properties are described in Name-Value Pair Arguments.