rlTRPOAgent
Description
Trust region policy optimization (TRPO) is a model-free, online, on-policy, policy gradient reinforcement learning method. This algorithm prevents significant performance drops compared to standard policy gradient methods by keeping the updated policy within a trust region close to the current policy. The action space can be either discrete or continuous. For continuous action spaces, this agent does not enforce constraints set in the action specification; therefore, if you need to enforce action constraints, you must do so within the environment.
Note
TRPO agents do not support recurrent networks.
For more information on TRPO agents, see Trust Region Policy Optimization (TRPO) Agents. For more information on the different types of reinforcement learning agents, see Reinforcement Learning Agents.
Creation
Syntax
Description
Create Agent from Observation and Action Specifications
creates a trust region policy optimization (TRPO) agent for an environment with the
given observation and action specifications, using default initialization options. The
actor and critic in the agent use default deep neural networks built from the
observation specification agent
= rlTRPOAgent(observationInfo
,actionInfo
)observationInfo
and the action
specification actionInfo
. The ObservationInfo
and ActionInfo
properties of agent
are set to
the observationInfo
and actionInfo
input
arguments, respectively.
creates a TRPO agent for an environment with the given observation and action
specifications. The agent uses default networks configured using options specified in
the agent
= rlTRPOAgent(observationInfo
,actionInfo
,initOpts
)initOpts
object. TRPO agents do not support recurrent neural
networks. For more information on the initialization options, see rlAgentInitializationOptions
.
Create Agent from Actor and Critic
Specify Agent Options
creates a TRPO agent and sets the AgentOptions
property to the agent
= rlTRPOAgent(___,agentOptions
)agentOptions
input argument. Use this syntax after
any of the input arguments in the previous syntaxes.
Input Arguments
Properties
Object Functions
train | Train reinforcement learning agents within a specified environment |
sim | Simulate trained reinforcement learning agents within specified environment |
getAction | Obtain action from agent, actor, or policy object given environment observations |
getActor | Extract actor from reinforcement learning agent |
setActor | Set actor of reinforcement learning agent |
getCritic | Extract critic from reinforcement learning agent |
setCritic | Set critic of reinforcement learning agent |
generatePolicyFunction | Generate MATLAB function that evaluates policy of an agent or policy object |
Examples
Tips
For continuous action spaces, this agent does not enforce the constraints set by the action specification. In this case, you must enforce action space constraints within the environment.
While tuning the learning rate of the actor network is necessary for PPO agents, it is not necessary for TRPO agents.
For high-dimensional observations, such as for images, it is recommended to use PPO, SAC, or TD3 agents.
Version History
Introduced in R2021b
See Also
Apps
Functions
getAction
|getActor
|getCritic
|getModel
|generatePolicyFunction
|generatePolicyBlock
|getActionInfo
|getObservationInfo
Objects
rlTRPOAgentOptions
|rlAgentInitializationOptions
|rlValueFunction
|rlDiscreteCategoricalActor
|rlContinuousGaussianActor
|rlACAgent
|rlPGAgent
|rlPPOAgent