Main Content

getMaxQValue

Obtain maximum state-value function estimate for Q-value function representation with discrete action space

Description

example

[maxQ,maxActionIndex] = getMaxQValue(qValueRep,obs) returns the maximum estimated state-value function for Q-value function representation qValueRep given environment observations obs. getMaxQValue determines the discrete action for which the Q-value estimate is greatest and returns that Q value (maxQ) and the corresponding action index (maxActionIndex).

[maxQ,maxActionIndex,state] = getMaxQValue(___) returns the state of the representation. Use this syntax when qValueRep is a recurrent neural network.

Examples

collapse all

Create an environment and obtain observation and action information.

env = rlPredefinedEnv('CartPole-Discrete');
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
numObs = obsInfo.Dimension(1);
numDiscreteAct = numel(actInfo.Elements);

Create a deep neural network for a multi-output Q-value function representation.

criticNetwork = [
    featureInputLayer(4,'Normalization','none','Name','state')
    fullyConnectedLayer(50, 'Name', 'CriticStateFC1')
    reluLayer('Name','CriticRelu1')
    fullyConnectedLayer(20,'Name','CriticStateFC2')
    reluLayer('Name','CriticRelu2')
    fullyConnectedLayer(numDiscreteAct,'Name','output')];

Create a representation for your critic using the recurrent neural network.

criticOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
    'Observation','state',criticOptions);

Obtain value function estimates for each possible discrete action using random observations.

obs = rand(4,1);
val = getValue(critic,{obs})
val = 2x1 single column vector

    0.0139
   -0.1851

val contains two value function estimates, one for each possible discrete action.

You can obtain the maximum Q-value function estimate across all the discrete actions.

[maxVal,maxIndex] = getMaxQValue(critic,{obs})
maxVal = single
    0.0139
maxIndex = 1

maxVal corresponds to the maximum entry in val.

You can also obtain maximum Q-value function estimates for a batch of observations. For example, obtain value function estimates for a batch of 10 observations.

[batchVal,batchIndex] = getMaxQValue(critic,{rand(4,1,10)});

Input Arguments

collapse all

Q-value representation, specified as an rlQValueRepresentation object.

Environment observations, specified as a cell array with as many elements as there are observation input channels. Each element of obs contains an array of observations for a single observation input channel.

The dimensions of each element in obs are MO-by-LB-by-LS, where:

  • MO corresponds to the dimensions of the associated observation input channel.

  • LB is the batch size. To specify a single observation, set LB = 1. To specify a batch of observations, specify LB > 1. If valueRep or qValueRep has multiple observation input channels, then LB must be the same for all elements of obs.

  • LS specifies the sequence length for a recurrent neural network. If valueRep or qValueRep does not use a recurrent neural network, then LS = 1. If valueRep or qValueRep has multiple observation input channels, then LS must be the same for all elements of obs.

LB and LS must be the same for both act and obs.

Output Arguments

collapse all

Maximum Q-value estimate across all possible discrete actions, returned as a 1-by-LB-by-LS array, where:

  • LB is the batch size.

  • LS specifies the sequence length for a recurrent neural network. If qValueRep does not use a recurrent neural network, then LS = 1.

Action index corresponding to the maximum Q value, returned as a 1-by-LB-by-LS array, where:

  • LB is the batch size.

  • LS specifies the sequence length for a recurrent neural network. If qValueRep does not use a recurrent neural network, then LS = 1.

Representation state, returned as a cell array. If qValueRep does not use a recurrent neural network, then state is an empty cell array.

You can set the state of the representation to state using the setState function. For example:

valueRep = setState(qValueRep,state);
Introduced in R2020a