reset

Reset environment, agent, experience buffer, or policy object

Since R2022a

collapse all in page

Syntax

initialObs = reset(env)

reset(agent)

agent = reset(agent)

resetPolicy = reset(policy)

reset(buffer)

Description

initialObs = reset(env) resets the specified MATLAB^® environment to an initial state and returns the resulting initial observation value.

Do not use the reset function for Simulink^® environments, which are implicitly reset when running a new simulation. Instead, customize the reset behavior using the ResetFcn property of the environment.

example

reset(agent) resets the specified agent. Resetting a built-in agent performs the following actions, if applicable.

Empty experience buffer.
Set recurrent neural network states of actor and critic networks to zero.
Reset the states of any noise models used by the agent.

example

agent = reset(agent) also returns the reset agent as an output argument.

resetPolicy = reset(policy) returns the policy object resetPolicy in which any recurrent neural network states are set to zero and any noise model states are set to their initial conditions. This syntax has no effect if the policy object does not use a recurrent neural network and does not have a noise model with state.

example

reset(buffer) resets the specified replay memory buffer by removing all the experiences.

example

Examples

collapse all

Reset Environment

Open Live Script

Create a reinforcement learning environment. For this example, create a continuous-time cart-pole system.

env = rlPredefinedEnv("CartPole-Continuous");

Reset the environment and return the initial observation.

initialObs = reset(env)

initialObs = 4×1

         0
         0
    0.0315
         0

Reset Agent

Open Live Script

Create observation and action specifications.

obsInfo = rlNumericSpec([4 1]);
actInfo = rlNumericSpec([1 1]);

Create a default DDPG agent using these specifications.

initOptions = rlAgentInitializationOptions(UseRNN=true);
agent = rlDDPGAgent(obsInfo,actInfo,initOptions);

Reset the agent.

agent = reset(agent);

Reset Experience Buffer

Open Live Script

Create observation and action specifications.

obsInfo = rlNumericSpec([4 1]);
actInfo = rlNumericSpec([1 1]);

Create a replay memory experience buffer.

buffer = rlReplayMemory(obsInfo,actInfo,10000);

Add experiences to the buffer. For this example, add 20 random experiences.

for i = 1:20
    expBatch(i).Observation = {obsInfo.UpperLimit.*rand(4,1)};
    expBatch(i).Action = {actInfo.UpperLimit.*rand(1,1)};
    expBatch(i).NextObservation = {obsInfo.UpperLimit.*rand(4,1)};
    expBatch(i).Reward = 10*rand(1);
    expBatch(i).IsDone = 0;
end
expBatch(20).IsDone = 1;

append(buffer,expBatch);

Reset and clear the buffer.

reset(buffer)

Reset Policy

Open Live Script

Create observation and action specifications.

obsInfo = rlNumericSpec([4 1]);
actInfo = rlFiniteSetSpec([-1 0 1]);

To approximate the Q-value function within the critic, use a deep neural network. Create each network path as an array of layer objects.

% Create Paths
obsPath = [featureInputLayer(4) 
           fullyConnectedLayer(1,Name="obsout")];

actPath = [featureInputLayer(1) 
           fullyConnectedLayer(1,Name="actout")];

comPath = [additionLayer(2,Name="add")  ...
           fullyConnectedLayer(1)];

% Create dlnetwork object and add Layers
net = dlnetwork;
net = addLayers(net,obsPath); 
net = addLayers(net,actPath); 
net = addLayers(net,comPath);
net = connectLayers(net,"obsout","add/in1");
net = connectLayers(net,"actout","add/in2");

% Initialize network
net = initialize(net);

% Display the number of weights
summary(net)

   Initialized: true

   Number of learnables: 9

   Inputs:
      1   'input'     4 features
      2   'input_1'   1 features

Create an epsilon-greedy policy object using a Q-value function approximator.

critic = rlQValueFunction(net,obsInfo,actInfo);
policy = rlEpsilonGreedyPolicy(critic)

policy = 
  rlEpsilonGreedyPolicy with properties:

            QValueFunction: [1×1 rl.function.rlQValueFunction]
        ExplorationOptions: [1×1 rl.option.EpsilonGreedyExploration]
             Normalization: ["none"    "none"]
    UseEpsilonGreedyAction: 1
        EnableEpsilonDecay: 1
           ObservationInfo: [1×1 rl.util.rlNumericSpec]
                ActionInfo: [1×1 rl.util.rlFiniteSetSpec]
                SampleTime: -1

Reset the policy.

policy = reset(policy);

Input Arguments

collapse all

`env` — Environment
reinforcement learning environment object

Environment, specified as follows:

MATLAB environment, represented by one of the following objects.
- Predefined environment created using rlPredefinedEnv.
- rlMDPEnv — Markov decision process environment.
- rlFunctionEnv — Environment defined using custom functions.
- rlMultiAgentFunctionEnv — Multiagent environment in which all agents execute in the same step.
- rlTurnBasedFunctionEnv — Turn-based multiagent environment in which agents execute in turns.
- Custom environment created from a template, using rlCreateEnvTemplate.
- rlNeuralNetworkEnvironment — Environment with neural network transition models.
Among the MATLAB environments, only rlMultiAgentFunctionEnv and rlTurnBasedFunctionEnv support training more agents at the same time.
Simulink environment, represented by a SimulinkEnvWithAgent object, and created using:
- rlSimulinkEnv — This environment is created from a model already containing one or more agents block, and supports training multiple agents at the same time.
- createIntegratedEnv — This environment is created from a model that does not already contain an agent block, and does not supports training multiple agents at the same time.
A Simulink-based environment object acts as an interface so that the reinforcement learning simulation or training function calls the (compiled) Simulink model to generate experiences for the agents. Such an environment does not support using the reset and step functions.

Note

env is a handle object, so a function that does not return it as output argument, such as train, can still update its internal states. For more information about handle objects, see Handle Object Behavior.

For more information on reinforcement learning environments, see Reinforcement Learning Environments and Create Custom Simulink Environments.

Example: env = rlPredefinedEnv("DoubleIntegrator-Continuous") creates a predefined environment that implements a continuous-action double-integrator system and assigns it to the variable env.

`agent` — Agent
reinforcement learning agent object

Agent, specified as one of the following reinforcement learning agent objects:

rlQAgent
rlSARSAAgent
rlLSPIAgent
rlDQNAgent
rlPGAgent
rlACAgent
rlPPOAgent
rlTRPOAgent
rlTD3Agent
rlDDPGAgent
rlSACAgent
rlMBPOAgent
Custom agent — For more information on custom agents, see Create Custom Reinforcement Learning Agents.

Note

agent is a handle object, so a function that does not return it as output argument, such as train, can still update it. For more information about handle objects, see Handle Object Behavior.

For more information on reinforcement learning agents, see Reinforcement Learning Agents.

Example: agent = rlPPOAgent(rlNumericSpec([2 1]),rlNumericSpec([1 1])) creates the default rlPPOAgent object agent for an environment with an observation channel carrying a continuous two-element vector and an action channel carrying a continuous scalar.

`buffer` — Experience buffer
`rlReplayMemory` object | `rlPrioritizedReplayMemory` object | `rlHindsightReplayMemory` object | `rlHindsightPrioritizedReplayMemory` object

Experience buffer, specified as one of the following replay memory objects.

Example: rlReplayMemory(rlNumericSpec([1 1]),rlFiniteSetSpec([0 1]),1e5)

`policy` — Reinforcement learning policy
reinforcement learning agent object

Reinforcement learning policy, specified as one of the following objects:

For more information on reinforcement learning policies, see Create Actors, Critics, and Policy Objects.

Example: policy = getExplorationPolicy(rlPPOAgent(rlNumericSpec([2 1]),rlNumericSpec([1 1]))) extracts the object that implements the exploration policy from a default PPO agent and assigns it to the variable policy.

Output Arguments

collapse all

`initialObs` — Initial environment observation
array | cell array

Initial environment observation after reset, returned as one of the following:

Array with dimensions matching the observation specification for an environment with a single observation channel.
Cell array with length equal to the number of observation channel for an environment with multiple observation channels. Each element of the cell array contains an array with dimensions matching the corresponding element of the environment observation specifications.

`resetPolicy` — Reset agent
`rlDeterministicActorPolicy` object | `rlAdditiveNoisePolicy` object | `rlEpsilonGreedyPolicy` object | `rlMaxQPolicy` object | `rlStochasticActorPolicy` object

Reset policy, returned as a policy object of the same type as agent but with its recurrent neural network states set to zero.

`agent` — Reset agent
reinforcement learning agent object

Reset agent, returned as an agent object. Note that agent is a handle object. Therefore, if it contains any recurrent neural network, its state is reset whether agent is returned as an output argument or not. For more information about handle objects, see Handle Object Behavior.

Version History

Introduced in R2022a

reset

Syntax

Description

Examples

Reset Environment

Reset Agent

Reset Experience Buffer

Reset Policy

Input Arguments

`env` — Environment
reinforcement learning environment object

`agent` — Agent
reinforcement learning agent object

`buffer` — Experience buffer
`rlReplayMemory` object | `rlPrioritizedReplayMemory` object | `rlHindsightReplayMemory` object | `rlHindsightPrioritizedReplayMemory` object

`policy` — Reinforcement learning policy
reinforcement learning agent object

Output Arguments

`initialObs` — Initial environment observation
array | cell array

`resetPolicy` — Reset agent
`rlDeterministicActorPolicy` object | `rlAdditiveNoisePolicy` object | `rlEpsilonGreedyPolicy` object | `rlMaxQPolicy` object | `rlStochasticActorPolicy` object

`agent` — Reset agent
reinforcement learning agent object

Version History

See Also

Functions

reset

Syntax

Description

Examples

Reset Environment

Reset Agent

Reset Experience Buffer

Reset Policy

Input Arguments

env — Environment reinforcement learning environment object

agent — Agent reinforcement learning agent object

buffer — Experience buffer rlReplayMemory object | rlPrioritizedReplayMemory object | rlHindsightReplayMemory object | rlHindsightPrioritizedReplayMemory object

policy — Reinforcement learning policy reinforcement learning agent object

Output Arguments

initialObs — Initial environment observation array | cell array

resetPolicy — Reset agent rlDeterministicActorPolicy object | rlAdditiveNoisePolicy object | rlEpsilonGreedyPolicy object | rlMaxQPolicy object | rlStochasticActorPolicy object

agent — Reset agent reinforcement learning agent object

Version History

See Also

Functions

`env` — Environment
reinforcement learning environment object

`agent` — Agent
reinforcement learning agent object

`buffer` — Experience buffer
`rlReplayMemory` object | `rlPrioritizedReplayMemory` object | `rlHindsightReplayMemory` object | `rlHindsightPrioritizedReplayMemory` object

`policy` — Reinforcement learning policy
reinforcement learning agent object

`initialObs` — Initial environment observation
array | cell array

`resetPolicy` — Reset agent
`rlDeterministicActorPolicy` object | `rlAdditiveNoisePolicy` object | `rlEpsilonGreedyPolicy` object | `rlMaxQPolicy` object | `rlStochasticActorPolicy` object

`agent` — Reset agent
reinforcement learning agent object