Create Markov decision process environment for reinforcement learning
A Markov decision process (MDP) is a discrete time stochastic control process. It
provides a mathematical framework for modeling decision making in situations where outcomes
are partly random and partly under the control of the decision maker. MDPs are useful for
studying optimization problems solved using reinforcement learning. Use
rlMDPEnv to create a MATLAB® based Markov decision process environment object
env = rlMDPEnv(MDP)
Model— Markov decision process model
Markov decision process model, specified as a
ResetFcn— Reset function
Reset function, specified as a function handle.
|Obtain action data specifications from reinforcement learning environment or agent|
|Obtain observation data specifications from reinforcement learning environment or agent|
|Simulate a trained reinforcement learning agent within a specified environment|
|Train a reinforcement learning agent within a specified environment|
|Validate custom reinforcement learning environment|
For this example, consider a 5-by-5 grid world with the following rules:
A 5-by-5 grid world bounded by borders, with 4 possible actions (North = 1, South = 2, East = 3, West = 4).
The agent begins from cell [2,1] (second row, first column).
The agent receives reward +10 if it reaches the terminal state at cell [5,5] (blue).
The environment contains a special jump from cell [2,4] to cell [4,4] with +5 reward.
The agent is blocked by obstacles in cells [3,3], [3,4], [3,5] and [4,3] (black cells).
All other actions result in -1 reward.
First, create a
GridWorld object using the
GW = createGridWorld(5,5)
GW = GridWorld with properties: GridSize: [5 5] CurrentState: "[1,1]" States: [25×1 string] Actions: [4×1 string] T: [25×25×4 double] R: [25×25×4 double] ObstacleStates: [0×1 string] TerminalStates: [0×1 string]
Now, set the initial, terminal and obstacle states.
GW.CurrentState = '[2,1]'; GW.TerminalStates = '[5,5]'; GW.ObstacleStates = ["[3,3]";"[3,4]";"[3,5]";"[4,3]"];
Update the state transition matrix for the obstacle states and set the jump rule over the obstacle states.
updateStateTranstionForObstacles(GW) GW.T(state2idx(GW,"[2,4]"),:,:) = 0; GW.T(state2idx(GW,"[2,4]"),state2idx(GW,"[4,4]"),:) = 1;
Next, define the rewards in the reward transition matrix.
nS = numel(GW.States); nA = numel(GW.Actions); GW.R = -1*ones(nS,nS,nA); GW.R(state2idx(GW,"[2,4]"),state2idx(GW,"[4,4]"),:) = 5; GW.R(:,state2idx(GW,GW.TerminalStates),:) = 10;
rlMDPEnv to create a grid world environment using the
env = rlMDPEnv(GW)
env = rlMDPEnv with properties: Model: [1×1 rl.env.GridWorld] ResetFcn: 
You can visualize the grid world environment using the