Documentation

createMDP

Create Markov decision process model

Description

example

MDP = createMDP(states,actions) creates a Markov decision process model with the specified states and actions.

Examples

collapse all

Create an MDP model with eight states and two possible actions.

MDP = createMDP(8,["up";"down"]);

Specify the state transitions and their associated rewards.

% State 1 Transition and Reward
MDP.T(1,2,1) = 1;
MDP.R(1,2,1) = 3;
MDP.T(1,3,2) = 1;
MDP.R(1,3,2) = 1;
% State 2 Transition and Reward
MDP.T(2,4,1) = 1;
MDP.R(2,4,1) = 2;
MDP.T(2,5,2) = 1;
MDP.R(2,5,2) = 1;
% State 3 Transition and Reward
MDP.T(3,5,1) = 1;
MDP.R(3,5,1) = 2;
MDP.T(3,6,2) = 1;
MDP.R(3,6,2) = 4;
% State 4 Transition and Reward
MDP.T(4,7,1) = 1;
MDP.R(4,7,1) = 3;
MDP.T(4,8,2) = 1;
MDP.R(4,8,2) = 2;
% State 5 Transition and Reward
MDP.T(5,7,1) = 1;
MDP.R(5,7,1) = 1;
MDP.T(5,8,2) = 1;
MDP.R(5,8,2) = 9;
% State 6 Transition and Reward
MDP.T(6,7,1) = 1;
MDP.R(6,7,1) = 5;
MDP.T(6,8,2) = 1;
MDP.R(6,8,2) = 1;
% State 7 Transition and Reward
MDP.T(7,7,1) = 1;
MDP.R(7,7,1) = 0;
MDP.T(7,7,2) = 1;
MDP.R(7,7,2) = 0;
% State 8 Transition and Reward
MDP.T(8,8,1) = 1;
MDP.R(8,8,1) = 0;
MDP.T(8,8,2) = 1;
MDP.R(8,8,2) = 0;

Specify the terminal states of the model.

MDP.TerminalStates = ["s7";"s8"];

Input Arguments

collapse all

Model states, specified as one of the following:

• Positive integer — Specify the number of model states. In this case, each state has a default name, such as "s1" for the first state.

• String vector — Specify the state names. In this case, the total number of states is equal to the length of the vector.

Model actions, specified as one of the following:

• Positive integer — Specify the number of model actions. In this case, each action has a default name, such as "a1" for the first action.

• String vector — Specify the action names. In this case, the total number of actions is equal to the length of the vector.

Output Arguments

collapse all

MDP model, returned as a GenericMDP object with the following properties.

Name of the current state, specified as a string.

State names, specified as a string vector with length equal to the number of states.

Action names, specified as a string vector with length equal to the number of actions.

State transition matrix, specified as a 3-D array, which determines the possible movements of the agent in an environment. State transition matrix T is a probability matrix that indicates how likely the agent will move from the current state s to any possible next state s' by performing action a. T is given by:

T is an S-by-S-by-A array, where S is the number of states and A is the number of actions.

Reward transition matrix, specified as a 3-D array, which determines how much reward the agent receives after performing an action in the environment. R has the same shape and size as state transition matrix T. The reward for moving from state s to state s' by performing action a is given by:

Terminal state names in the grid world, specified as a string vector of state names. 