Agents
Create and configure reinforcement learning agents
A reinforcement learning agent receives observations and a reward from the environment, and returns an action to the environment. During training, the agent continuously updates its parameters to improve its policy for the given environment.
Reinforcement Learning Toolbox™ software provides built-in reinforcement learning agents that use several common algorithms, such as Q-Learning, DQN, PG, AC, DDPG, TD3, SAC and PPO. You can also implement your own custom agents.
For an introduction to agents, see Reinforcement Learning Agents. For an introduction to policies, value functions, actors and critics, see Create Policies and Value Functions.
Apps
| Reinforcement Learning Designer | Design, train, and simulate reinforcement learning agents (Since R2021a) | 
Blocks
| RL Agent | Reinforcement learning agent | 
Functions
Topics
Agent Basics
- Reinforcement Learning Agents
 You can create an agent using one of several standard reinforcement learning algorithms or define your own custom agent.
- Create Agents Using Reinforcement Learning Designer
 Interactively create or import agents for training using the Reinforcement Learning Designer app.
Agent Types
- Q-Learning Agent
 Q-learning agent description and algorithm.
- SARSA Agent
 SARSA agent description and algorithm.
- LSPI Agent
 LSPI agent description and algorithm.
- Deep Q-Network (DQN) Agent
 DQN agent description and algorithm.
- REINFORCE Policy Gradient (PG) Agent
 Vanilla policy gradient agent description and algorithm.
- Actor-Critic (AC) Agent
 Actor-critic agent description and algorithm.
- Proximal Policy Optimization (PPO) Agent
 PPO agent description and algorithm.
- Trust Region Policy Optimization (TRPO) Agent
 TRPO agent description and algorithm.
- Deep Deterministic Policy Gradient (DDPG) Agent
 DDPG agent description and algorithm.
- Twin-Delayed Deep Deterministic (TD3) Policy Gradient Agent
 TD3 agent description and algorithm.
- Soft Actor-Critic (SAC) Agent
 SAC agent description and algorithm.
- Model-Based Policy Optimization (MBPO) Agent
 A model-based (MBPO) reinforcement learning agent learns a model of its environment that it can use to generate additional experiences for training.
Custom Agents
- Create Custom Reinforcement Learning Agents
 Create custom agents.
- Create and Train Custom PG Agent
 Create a custom PG agent and train it using the built-in train function.
- Create and Train Custom LQR Agent
 Create a custom agent that solves an LQR problem and train it using the built-in train function.