Main Content

getAction

Obtain action from agent or actor given environment observations

Description

Agent

agentAction = getAction(agent,obs) returns the action derived from the policy of a reinforcement learning agent, given environment observations. If agent contains a recurrent neural network, its state is updated.

[agentAction,agent] = getAction(actor,obs) also returns the updated agent as an output argument.

Actor Representation

example

actorAction = getAction(actor,obs) returns the action derived from the policy represented by the actor actor, given environment observations obs.

[actorAction,nextState] = getAction(actor,obs) also returns the updated state of the actor when it uses a recurrent neural network.

Examples

collapse all

Create an environment with a discrete action space, and obtain its observation and action specifications. For this example, load the environment used in the example Create Agent Using Deep Network Designer and Train Using Image Observations.

% load predefined environment
env = rlPredefinedEnv("SimplePendulumWithImage-Discrete");

Obtain the observation and action specifications for this environment.

obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);

Create a TRPO agent from the environment observation and action specifications.

agent = rlTRPOAgent(obsInfo,actInfo);

Use getAction to return the action from a random observation.

getAction(agent, ...
    {rand(obsInfo(1).Dimension), ...
     rand(obsInfo(2).Dimension)})
ans = 1x1 cell array
    {[-2]}

You can also obtain actions for a batch of observations. For example, obtain actions for a batch of 10 observations.

actBatch = getAction(agent, ...
    {rand([obsInfo(1).Dimension 10]), ...
     rand([obsInfo(2).Dimension 10])});
size(actBatch{1})
ans = 1×3

     1     1    10

actBatch{1}(1,1,7)
ans = -2

actBatch contains one action for each observation in the batch.

Create observation and action information. You can also obtain these specifications from an environment.

obsinfo = rlNumericSpec([4 1]);
actinfo = rlNumericSpec([2 1]);

Create a deep neural network for the actor.

net = [featureInputLayer(obsinfo.Dimension(1), ...
           'Normalization','none','Name','state')
       fullyConnectedLayer(10,'Name','fc1')
       reluLayer('Name','relu1')
       fullyConnectedLayer(20,'Name','CriticStateFC2')
       fullyConnectedLayer(actinfo.Dimension(1),'Name','fc2')
       tanhLayer('Name','tanh1')];
net = dlnetwork(net);

Create a deterministic actor representation for the network.

actor = rlContinuousDeterministicActor(net, ...
    obsinfo,actinfo,...
    'ObservationInputNames',{'state'});

Obtain an action from this actor for a random batch of 20 observations.

act = getAction(actor,{rand(4,1,10)})
act = 1x1 cell array
    {2x1x10 single}

act is a single cell array that contains the two computed actions for all 10 observations in the batch.

act{1}(:,1,7)
ans = 2x1 single column vector

    0.2643
   -0.2934

Input Arguments

collapse all

Reinforcement learning agent, specified as one of the following objects:

Note

agent is an handle object. Therefore, if it contains any recurrent neural network, its internal state it is updated by evaluate whether agent is returned as an output argument or not. For more information about handle objects, see Handle Object Behavior.

Environment observations, specified as a cell array with as many elements as there are observation input channels. Each element of obs contains an array of observations for a single observation input channel.

The dimensions of each element in obs are MO-by-LB-by-LS, where:

  • MO corresponds to the dimensions of the associated observation input channel.

  • LB is the batch size. To specify a single observation, set LB = 1. To specify a batch of observations, specify LB > 1. If valueRep or qValueRep has multiple observation input channels, then LB must be the same for all elements of obs.

  • LS specifies the sequence length for a recurrent neural network. If valueRep or qValueRep does not use a recurrent neural network, then LS = 1. If valueRep or qValueRep has multiple observation input channels, then LS must be the same for all elements of obs.

LB and LS must be the same for both act and obs.

For more information on input and output formats for recurrent neural networks, see the Algorithms section of lstmLayer.

Output Arguments

collapse all

Action value from agent, returned as single-element cell array containing an array with dimensions MA-by-LB-by-LS, where:

  • MA corresponds to the dimensions of the associated action specification.

  • LB is the batch size.

  • LS is the sequence length for recurrent neural networks. If the actor and critic in agent do not use recurrent neural networks, then LS = 1.

Note

When agents such as rlACAgent, rlPGAgent, or rlPPOAgent use an rlContinuousGaussianActor actor the constraints set by the action specification are not enforced by the agent. In these cases, you must enforce action space constraints within the environment.

Action value from actor, returned as a single-element cell array containing an array of dimensions MA-by-LB-by-LS, where:

  • MA corresponds to the dimensions of the action specification.

  • LB is the batch size.

  • LS is the sequence length for a recurrent neural network. If actor does not use a recurrent neural network, then LS = 1.

Note

Actors with a continuous action space do not enforce any constraints set by the action specification. Agents using rlContinuousDeterministicActor do enforce such constraints, however, for agents using rlContinuousGaussianActor, you must enforce action space constraints within the environment.

Next state of the actor, returned as a cell array. If actor does not use a recurrent neural network, then state is an empty cell array.

You can set the state of the representation to state using the setState function. For example:

actor = setState(actor,state);

Updated agent, returned as an agent object. Note that agent is an handle object. Therefore, if it contains any recurrent neural network, its internal state it is updated by evaluate whether agent is returned as an output argument or not. For more information about handle objects, see Handle Object Behavior.

Tips

The more general function evaluate behaves, for actor objects, similarly to getAction except for the following differences.

  • For an rlDiscreteCategoricalActor actor object, evaluate returns the probability of each possible actions, (instead of a sample action as getAction).

  • For an rlContinuousGaussianActor actor object, evaluate returns the mean and standard deviation of the Gaussian distribution, (instead of a sample action as getAction).

para

Version History

Introduced in R2020a