Create Custom Grid World Environments

A grid world is a two dimensional, cell based environment where the agent starts from one cell and moves towards the terminal cell, while collecting as much reward as possible. Grid world environments are useful for applying reinforcement learning algorithms to discover optimal paths and policies for agents on the grid to get to their terminal goal in the least number of moves.

Reinforcement Learning Toolbox™ lets you create custom MATLAB® grid world environments for your own applications. To create a custom grid world environment:

  1. Create grid world model.

  2. Configure the grid world model.

  3. Use the grid world model to create your own grid world environment.

Grid World Model

You can create your own grid world model using the createGridWorld function. Specify the grid size when creating the GridWorld model object.

The GridWorld object has the following properties:

PropertyRead-OnlyDescription
GridSizeYes

Dimensions of the grid world, displayed as a [m,n] vector. Here, m represents the number of rows while n is the number of columns of the grid.

CurrentStateNo

Name of the current state of the agent, specified as a string. You can use this property to set the initial state of the agent. The agent always starts from the [1,1] cell, by default.

The agent starts from the CurrentState once you use the reset function in the rlMDPEnv environment object.

StatesYes

A string vector containing the state names of the grid world. For instance, for a 2-by-2 grid world model GW:

GW.States = ["[1,1]""[2,1]""[1,2]""[2,2]"].

ActionsYes

A string vector containing the list of possible actions that the agent can use. You can set the action when you create the grid world model by using the moves argument:

GW = createGridWorld(m,n,moves)

Specify moves as either 'Standard' or 'Kings'.

moves = 'Standard'
GW.Actions = ['N';'S';'E';'W']
moves = 'Kings'
GW.Actions = ['N';'S';'E';'W';'NE';'NW';'SE';'SW']
TNo

State transition matrix, specified as a 3-D array. T is a probability matrix that indicates how likely the agent will move from the current state s to any possible next states s' by performing action a.

T can be denoted as:

T(s,s',a) = probability(s'|s,a).

For instance, consider a 5-by-5 deterministic grid world object GW with the agent in cell [3,1]. Extract the state transition matrix for North and observe the state transition matrix when the agent moves North.

northStateTransition = GW.T(:,:,1)

From the above figure, the value of northStateTransition(3,2) is 1 since the agent will move from cell [3,1] to cell [2,1] with action 'N'. A probability of 1 indicates that from a given state, if the agent goes North, it has 100% chance of moving one cell North on the grid. For an example on setting up the state transition matrix, see Train Reinforcement Learning Agent in Basic Grid World.

RNo

Reward transition matrix, specified as a 3-D array. R determines how much reward the agent receives after performing an action in the environment. R has the same shape and size as state transition matrix T.

Reward transition matrix R can be denoted as

r = R(s,s',a).

Set up R such that there is a reward to the agent after every action. For instance, you can set up a positive reward if the agent transitions over obstacle states and when it reaches the terminal state. You can also set up a default reward of -11 for all actions the agent takes, independent of the current state and next state. For an example on setting up the reward transition matrix, see Train Reinforcement Learning Agent in Basic Grid World.

ObstacleStatesNo

ObstacleStates are states that cannot be reached in the grid world, specified as a string vector. Consider the following 5-by-5 grid world model GW.

The black cells are obstacle states, and you can specify them by:

GW.ObstacleStates = ["[3,3]";"[3,4]";"[3,5]";"[4,3]"];

For a workflow example, see Train Reinforcement Learning Agent in Basic Grid World.

TerminalStatesNo

TerminalStates are the final states in the grid world, specified as a string vector. Consider the previous 5-by-5 grid world model GW. The blue cell is the terminal state and you can specify it by:

GW.TerminalStates = "[5,5]";

For a workflow example, see Train Reinforcement Learning Agent in Basic Grid World.

Grid World Environment

You can create a Markov decision process (MDP) environment using rlMDPEnv from the grid world model from the previous step. MDP is a discrete time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of the decision maker. The agent uses the grid world environment object rlMDPEnv to interact with the grid world model object GridWorld.

For more information, see rlMDPEnv and Train Reinforcement Learning Agent in Basic Grid World.

See Also

| |

Related Topics