getValue

Obtain estimated value from a critic given environment observations and actions

collapse all in page

Syntax

value = getValue(valueFcnAppx,obs)

value = getValue(vqValueFcnAppx,obs)

value = getValue(qValueFcnAppx,obs,act)

[value,state] = getValue(___)

___ = getValue(___,UseForward=useForward)

Description

Value Function Critic

value = getValue(valueFcnAppx,obs) evaluates the value function critic valueFcnAppx and returns the value corresponding to the observation obs. In this case, valueFcnAppx is an rlValueFunction approximator object.

example

Q-Value Function Critics

value = getValue(vqValueFcnAppx,obs) evaluates the discrete-action-space Q-value function critic vqValueFcnAppx and returns the vector value, in which each element represents the estimated value given the state corresponding to the observation obs and the action corresponding to the element number of value. In this case, vqValueFcnAppx is an rlVectorQValueFunction approximator object.

example

value = getValue(qValueFcnAppx,obs,act) evaluates the Q-value function critic qValueFcnAppx and returns the scalar value, representing the value given the observation obs and action act. In this case, qValueFcnAppx is an rlQValueFunction approximator object.

example

Return Recurrent Neural Network State

[value,state] = getValue(___) also returns the updated state of the critic object when it contains a recurrent neural network.

Use Forward

___ = getValue(___,UseForward=useForward) allows you to explicitly call a forward pass when computing gradients.

Examples

collapse all

Obtain Value Function Estimates

Open Live Script

Create an observation specification object (or alternatively use getObservationInfo to extract the specification object from an environment). For this example, define the observation space as a continuous four-dimensional space, so that a single observation is a column vector containing four doubles.

obsInfo = rlNumericSpec([4 1]);

To approximate the value function within the critic, create a neural network. Define a single path from the network input (the observation) to its output (the value), as an array of layer objects.

net = [ featureInputLayer(4) ...
        fullyConnectedLayer(1)];

Convert the network to a dlnetwork object and display the number of weights.

net = dlnetwork(net);
summary(net);

   Initialized: true

   Number of learnables: 5

   Inputs:
      1   'input'   4 features

Create a critic using the network and the observation specification object. When you use this syntax the network input layer is automatically associated with the environment observation according to the dimension specifications in obsInfo.

critic = rlValueFunction(net,obsInfo);

Obtain a value function estimate for a random single observation. Use an observation array with the same dimensions as the observation specification.

val = getValue(critic,{rand(4,1)})

val = single

0.7904

You can also obtain value function estimates for a batch of observations. For example obtain value functions for a batch of 20 observations.

batchVal = getValue(critic,{rand(4,1,20)});
size(batchVal)

ans = 1×2

     1    20

valBatch contains one value function estimate for each observation in the batch.

Obtain Vector Q-Value Function Estimates

Open Live Script

Create observation and action specification objects (or alternatively use getObservationInfo and getActionInfo to extract the specification objects from an environment). For this example, define the observation space as a continuous four-dimensional space, so that a single observation is a column vector containing four doubles, and the action space as a finite set consisting of three possible values (named 7, 5, and 3 in this case).

obsInfo = rlNumericSpec([4 1]);
actInfo = rlFiniteSetSpec([7 5 3]);

Create a vector Q-value function approximator to use as a critic. A vector Q-value function takes only the observation as input and returns as output a single vector with as many elements as the number of possible actions. The value of each output element represents the expected discounted cumulative long-term reward for taking the action from the state corresponding to the current observation, and following the policy afterwards.

To model the parametrized vector Q-value function within the critic. The network must have one input layer that accepts a four-element vector, as defined by obsInfo. The output must be a single output layer having as many elements as the number of possible discrete actions (three in this case, as defined by actInfo).

Define a single path from the network input to its output as array of layer objects.

net = [
    featureInputLayer(4) 
    fullyConnectedLayer(3)
    ];

Convert the network to a dlnetwork object and display the number of weights.

net = dlnetwork(net);
summary(net)

   Initialized: true

   Number of learnables: 15

   Inputs:
      1   'input'   4 features

Create the critic using the network, as well as the names of the observation and action specification objects. The network input layers are automatically associated with the observation channels, according to the dimension specifications in obsInfo.

critic = rlVectorQValueFunction(net,obsInfo,actInfo);

Use getValue to return the values of a random observation, using the current network weights.

v = getValue(critic,{rand(obsInfo.Dimension)})

v = 3×1 single column vector

    0.7232
    0.8177
   -0.2212

v contains three value function estimates, one for each possible discrete action.

You can also obtain value function estimates for a batch of observations. For example, obtain value function estimates for a batch of 10 observations.

batchV = getValue(critic,{rand([obsInfo.Dimension 10])});
size(batchV)

ans = 1×2

     3    10

batchV contains three value function estimates for each observation in the batch.

Obtain Single-Output Q-Value Function Estimates

Open Live Script

Create observation and action specification objects (or alternatively use getObservationInfo and getObservationInfo to extract the specification object from an environment). For this example, define the observation space as having two continuous channels, the first one carrying an 8 by 3 matrix, and the second one a continuous four-dimensional vector. The action specification is a continuous column vector containing 2 doubles.

obsInfo = [rlNumericSpec([8 3]), rlNumericSpec([4 1])];
actInfo = rlNumericSpec([2 1]);

Create a custom basis function and its initial weight matrix. Note that both channels carry 2-D matrices but the respective myBasisFcn input has also the batch and sequence dimensions.

myBasisFcn = @(obsA,obsB,act) [...
    ones(30,1,size(obsA,3),like=obsA);
    reshape(obsA,24,1,[]); 
    reshape(obsB,4,1,[]); 
    reshape(act,2,1,[]);
    reshape(obsA,24,1,[]).^2; 
    reshape(obsB,4,1,[]).^2; 
    reshape(act,2,1,[]).^2;
    sin(reshape(obsA,24,1,[])); 
    sin(reshape(obsB,4,1,[])); 
    sin(reshape(act,2,1,[]));
    cos(reshape(obsA,24,1,[])); 
    cos(reshape(obsB,4,1,[])); 
    cos(reshape(act,2,1,[]))];
W0 = rand(150,1);

The output of the critic is the scalar W'*myBasisFcn(obs,act), representing the Q-value function to be approximated.

Create the critic.

critic = rlQValueFunction({myBasisFcn,W0}, ...
    obsInfo,actInfo);

Use getValue to return the value of a random observation-action pair, using the current parameter matrix.

v = getValue(critic,{rand(8,3),(1:4)'},{rand(2,1)})

v = single

72.7248

Create a random observation set of batch size 64 for each channel. The third dimension is the batch size, while the fourth is the sequence length for any recurrent neural network used by the critic (in this case not used).

batchobs_ch1 = rand(8,3,64,1);
batchobs_ch2 = rand(4,1,64,1);

Create a random action set of batch size 64.

batchact = rand(2,1,64,1);

Obtain the state-action value function estimate for the batch of observations and actions.

bv = getValue(critic,{batchobs_ch1,batchobs_ch2},{batchact});
size(bv)

ans = 1×2

     1    64

bv(23)

ans = single

44.8497

Input Arguments

collapse all

`valueFcnAppx` — Value function critic
`rlValueFunction` object

Value function critic, specified as an rlValueFunction approximator object.

`vqValueFcnAppx` — Vector Q-value function critic
`rlVectorQValueFunction` object

Vector Q-value function critic, specified as an rlVectorQValueFunction approximator object.

`qValueFcnAppx` — Q-value function critic
`rlQValueFunction` object

Q-value function critic, specified as an rlQValueFunction object.

`obs` — Observations
cell array

Observations, specified as a cell array with as many elements as there are observation input channels. Each element of obs contains an array of observations for a single observation input channel.

The dimensions of each element in obs are M_O-by-L_B-by-L_S, where:

M_O corresponds to the dimensions of the associated observation input channel.
L_B is the batch size. To specify a single observation, set L_B = 1. To specify a batch of observations, specify L_B > 1. If the critic object given as first input argument has multiple observation input channels, then L_B must be the same for all elements of obs.
L_S specifies the sequence length for a recurrent neural network. If the critic object given as first input argument does not use a recurrent neural network, then L_S = 1. If the critic object has multiple observation input channels, then L_S must be the same for all elements of obs.

L_B and L_S must be the same for both act and obs.

For more information on input and output formats for recurrent neural networks, see the Algorithms section of lstmLayer.

`act` — Action
single-element cell array

Action, specified as a single-element cell array that contains an array of action values.

The dimensions of this array are M_A-by-L_B-by-L_S, where:

M_A corresponds to the dimensions of the associated action specification.
L_B is the batch size. To specify a single observation, set L_B = 1. To specify a batch of observations, specify L_B > 1.
L_S specifies the sequence length for a recurrent neural network. If the critic object given as a first input argument does not use a recurrent neural network, then L_S = 1.

L_B and L_S must be the same for both act and obs.

For more information on input and output formats for recurrent neural networks, see the Algorithms section of lstmLayer.

`useForward` — Option to use forward pass
`false` (default) | `true`

Option to use forward pass, specified as a logical value. When you specify UseForward=true the function calculates its outputs using forward instead of predict. This allows layers such as batch normalization and dropout to appropriately change their behavior for training.

Example: true

Output Arguments

collapse all

`value` — Estimated value function
array

Estimated value function, returned as array with dimensions N-by-L_B-by-L_S, where:

N is the number of outputs of the critic network.
- For a state value critic (valueFcnAppx), N = 1.
- For a single-output state-action value function critic (qValueFcnAppx), N = 1.
- For a multi-output state-action value function critic (vqValueFcnAppx), N is the number of discrete actions.
L_B is the batch size.
L_S is the sequence length for a recurrent neural network.

`state` — Updated state of critic
cell array

Updated state of the critic, returned as a cell array. If the critic does not use a recurrent neural network, then state is an empty cell array.

You can set the state of the critic to state using dot notation. For example:

valueFcnAppx.State=state;

Tips

The more general function evaluate behaves, for critic objects, similarly to getValue except that evaluate returns results inside a single-cell array.
When the elements of the cell array in inData are dlarray objects, the elements of the cell array returned in outData are also dlarray objects. This allows getValue to be used with automatic differentiation.
Specifically, you can write a custom loss function that directly uses getValue and dlgradient within it, and then use dlfeval and dlaccelerate with your custom loss function. For an example, see Train Reinforcement Learning Policy Using Custom Training Loop and Custom Training Loop with Simulink Action Noise.

Version History

Introduced in R2020a

getValue

Syntax

Description

Value Function Critic

Q-Value Function Critics

Return Recurrent Neural Network State

Use Forward

Examples

Obtain Value Function Estimates

Obtain Vector Q-Value Function Estimates

Obtain Single-Output Q-Value Function Estimates

Input Arguments

`valueFcnAppx` — Value function critic
`rlValueFunction` object

`vqValueFcnAppx` — Vector Q-value function critic
`rlVectorQValueFunction` object

`qValueFcnAppx` — Q-value function critic
`rlQValueFunction` object

`obs` — Observations
cell array

`act` — Action
single-element cell array

`useForward` — Option to use forward pass
`false` (default) | `true`

Output Arguments

`value` — Estimated value function
array

`state` — Updated state of critic
cell array

Tips

Version History

See Also

Functions

Topics

getValue

Syntax

Description

Value Function Critic

Q-Value Function Critics

Return Recurrent Neural Network State

Use Forward

Examples

Obtain Value Function Estimates

Obtain Vector Q-Value Function Estimates

Obtain Single-Output Q-Value Function Estimates

Input Arguments

valueFcnAppx — Value function critic rlValueFunction object

vqValueFcnAppx — Vector Q-value function critic rlVectorQValueFunction object

qValueFcnAppx — Q-value function critic rlQValueFunction object

obs — Observations cell array

act — Action single-element cell array

useForward — Option to use forward pass false (default) | true

Output Arguments

value — Estimated value function array

state — Updated state of critic cell array

Tips

Version History

See Also

Functions

Topics

`valueFcnAppx` — Value function critic
`rlValueFunction` object

`vqValueFcnAppx` — Vector Q-value function critic
`rlVectorQValueFunction` object

`qValueFcnAppx` — Q-value function critic
`rlQValueFunction` object

`obs` — Observations
cell array

`act` — Action
single-element cell array

`useForward` — Option to use forward pass
`false` (default) | `true`

`value` — Estimated value function
array

`state` — Updated state of critic
cell array