createMDP

Create Markov decision process object

Syntax

MDP = createMDP(states,actions)

Description

A Markov decision process (MDP) is a discrete-time stochastic control process in which the state and observation belong to finite spaces, and stochastic rules govern state transitions. MDPs are useful for studying optimization problems solved using reinforcement learning. Use the createMDP function to create a GenericMDP object with specified states and transitions. You can then modify some of the object properties and pass it to rlMDPEnv to create an environment that agents can interact with.

MDP = createMDP(states,actions) creates a Markov decision process object with the specified states and actions.

example

Examples

collapse all

Create MDP Model

Open Live Script

Create a GenericMDP object with eight states and two possible actions.

MDP = createMDP(8,["up";"down"])

MDP = 
  GenericMDP with properties:

            CurrentState: "s1"
                  States: [8×1 string]
                 Actions: [2×1 string]
                       T: [8×8×2 double]
                       R: [8×8×2 double]
          TerminalStates: [0×1 string]
    ProbabilityTolerance: 8.8818e-16

Specify the state transitions and their associated rewards.

% State 1 transition and reward
MDP.T(1,2,1) = 1;
MDP.R(1,2,1) = 3;
MDP.T(1,3,2) = 1;
MDP.R(1,3,2) = 1;

% State 2 transition and reward
MDP.T(2,4,1) = 1;
MDP.R(2,4,1) = 2;
MDP.T(2,5,2) = 1;
MDP.R(2,5,2) = 1;

% State 3 transition and reward
MDP.T(3,5,1) = 1;
MDP.R(3,5,1) = 2;
MDP.T(3,6,2) = 1;
MDP.R(3,6,2) = 4;

% State 4 transition and reward
MDP.T(4,7,1) = 1;
MDP.R(4,7,1) = 3;
MDP.T(4,8,2) = 1;
MDP.R(4,8,2) = 2;

% State 5 transition and reward
MDP.T(5,7,1) = 1;
MDP.R(5,7,1) = 1;
MDP.T(5,8,2) = 1;
MDP.R(5,8,2) = 9;

% State 6 transition and reward
MDP.T(6,7,1) = 1;
MDP.R(6,7,1) = 5;
MDP.T(6,8,2) = 1;
MDP.R(6,8,2) = 1;

% State 7 transition and reward
MDP.T(7,7,1) = 1;
MDP.R(7,7,1) = 0;
MDP.T(7,7,2) = 1;
MDP.R(7,7,2) = 0;

% State 8 transition and reward
MDP.T(8,8,1) = 1;
MDP.R(8,8,1) = 0;
MDP.T(8,8,2) = 1;
MDP.R(8,8,2) = 0;

Specify the terminal states of the model.

MDP.TerminalStates = ["s7";"s8"];

You can now pass MDP to rlMDPEnv to create an environment in which you can train and simulate your agents.

Input Arguments

collapse all

`states` — Model states
positive integer | string vector

Model states, specified as one of the following:

Positive integer — Specify the number of model states. In this case, each state has a default name, such as "s1" for the first state.
String vector — Specify the state names. In this case, the total number of states is equal to the length of the vector.

`actions` — Model actions
positive integer | string vector

Model actions, specified as one of the following:

Positive integer — Specify the number of model actions. In this case, each action has a default name, such as "a1" for the first action.
String vector — Specify the action names. In this case, the total number of actions is equal to the length of the vector.

Output Arguments

collapse all

`MDP` — MDP model
`GenericMDP` object

MDP model, returned as a GenericMDP object with these properties.

`CurrentState` — Name of the current state
string

Name of the current state, specified as a string.

Example: MDP.CurrentState = "s2";

`States` — State names
string vector

State names, specified as a string vector with length equal to the number of states.

Example: MDP.States = ["America";"Europe";"China"];

`Actions` — Action names
string vector

Action names, specified as a string vector with length equal to the number of actions.

Example: MDP.Actions = ["GoWest";"GoEast"];

`T` — State transition matrix
3-D array

State transition matrix, specified as a 3-D array, which determines the possible movements of the agent in an environment. State transition matrix T is a probability matrix that indicates the agent of the agent moving from the current state s to any possible next state s' by performing action a. T is an S-by-S-by-A array, where S is the number of states and A is the number of actions. It is given by:

$T (s, s', a) = p r o b a b i l i t y (s' | s, a)$

The sum of the transition probabilities out from a nonterminal state s following a given action must add up to either one or zero. So, all stochastic transitions out of a given state must be specified at the same time.

For example, to indicate that in state 1 following action 4 there is an equal probability of moving to states 2 or 3, use this command:

MDP.T(1,[2 3],4) = [0.5 0.5];

You can also specify that, following an action, there is some probability of remaining in the same state.

MDP.T(1,[1 2 3 4],1) = [0.25 0.25 0.25 0.25];

Example: MDP.T(1,[1 2 3],1) = [0.25 0.5 0.25]

`R` — Reward transition matrix
3-D array

Reward transition matrix, specified as a 3-D array, which determines how much reward the agent receives after performing an action in the environment. R has the same shape and size as state transition matrix T. The reward for moving from state s to state s' by performing action a is given by:

$r = R (s, s', a) .$

Example: MDP.T(1,[1 2 3],1) = [-1 0.5 2]

`TerminalStates` — Terminal state names
string vector

Terminal state names, specified as a string vector of state names.

Example: MDP.TerminalStates = "s3"

Version History

Introduced in R2019a

createMDP

Syntax

Description

Examples

Create MDP Model

Input Arguments

`states` — Model states
positive integer | string vector

`actions` — Model actions
positive integer | string vector

Output Arguments

`MDP` — MDP model
`GenericMDP` object

`CurrentState` — Name of the current state
string

`States` — State names
string vector

`Actions` — Action names
string vector

`T` — State transition matrix
3-D array

`R` — Reward transition matrix
3-D array

`TerminalStates` — Terminal state names
string vector

Version History

See Also

Functions

Objects

Topics

createMDP

Syntax

Description

Examples

Create MDP Model

Input Arguments

states — Model states positive integer | string vector

actions — Model actions positive integer | string vector

Output Arguments

MDP — MDP model GenericMDP object

CurrentState — Name of the current state string

States — State names string vector

Actions — Action names string vector

T — State transition matrix 3-D array

R — Reward transition matrix 3-D array

TerminalStates — Terminal state names string vector

Version History

See Also

Functions

Objects

Topics

`states` — Model states
positive integer | string vector

`actions` — Model actions
positive integer | string vector

`MDP` — MDP model
`GenericMDP` object

`CurrentState` — Name of the current state
string

`States` — State names
string vector

`Actions` — Action names
string vector

`T` — State transition matrix
3-D array

`R` — Reward transition matrix
3-D array

`TerminalStates` — Terminal state names
string vector