- www.mathworks.com/help/reinforcement-learning/ug/create-custom-environment-from-class-template.html
- www.mathworks.com/help/reinforcement-learning/ref/rl.util.rlfinitesetspec.html
- www.mathworks.com/help/reinforcement-learning/ref/rl.util.rlnumericspec.html
Different Action spaces in different steps
9 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
In matlab RL, is it possible that the agent have one type of action space in the first step but another action space after that? for example in a grid world, in the first step of each episode, the action be choosing where to start on the grid, but in the next steps, choosing where to go from that point?
0 comentarios
Respuestas (1)
Shantanu Dixit
el 12 de Jul. de 2024
Hi Danial,
It is my understanding that you want the agent to have one type of action space for the first step in an episode and then possibly a different action space based on the policy.
Considering the example of grid-based world, the agent can choose a starting action space using ‘rlFiniteSetSpec’ which can be mapped to a position in the grid, once the first step is taken the action space can be changed to follow a policy to take further actions during the episode.
You can refer to the below code for reference, this follows a custom class implementation and a sample policy function for simulation.
classdef CustomGridWorld < rl.env.MATLABEnvironment
properties
GridSize = [5, 5]; % Size of the grid
CurrentState = [1, 1]; % Current position in the grid
TerminalState = [5, 5]; % Goal position in the grid
Obstacles = [3, 3; 3, 4; 3, 5; 4, 3]; % Positions of obstacles in the grid
IsFirstStep = true; % Flag to indicate the first step
end
methods
function this = CustomGridWorld()
% Define the observation and action spaces
ObservationInfo = rlNumericSpec([2 1]);
ObservationInfo.Name = 'Grid State';
ActionInfo = rlFiniteSetSpec(1:25); % Initial action space (choosing start position)
ActionInfo.Name = 'Grid Action';
this = this@rl.env.MATLABEnvironment(ObservationInfo, ActionInfo);
end
function [Observation, Reward, IsDone, LoggedSignals] = step(this, Action)
LoggedSignals = []; % Initialize LoggedSignals
if this.IsFirstStep
% Decode the action to a starting position
disp('Taking the first step!')
[row, col] = ind2sub(this.GridSize, Action);
this.CurrentState = [row, col];
this.IsFirstStep = false;
% Change action space to standard grid actions (up, down, left, right)
this.ActionInfo = rlFiniteSetSpec([1, 2, 3, 4]);
this.ActionInfo.Name = 'Grid Action';
else
% Standard grid movement logic
nextState = this.CurrentState;
switch Action
case 1 % Up
nextState = this.CurrentState + [-1, 0];
case 2 % Down
nextState = this.CurrentState + [1, 0];
case 3 % Left
nextState = this.CurrentState + [0, -1];
case 4 % Right
nextState = this.CurrentState + [0, 1];
end
% Check if next state is within bounds and not an obstacle
if all(nextState > 0) && all(nextState <= this.GridSize) && ...
~ismember(nextState, this.Obstacles, 'rows')
this.CurrentState = nextState;
else
disp('At an obstacle, take another action!')
end
end
% Set Observation, Reward, IsDone, and LoggedSignals
Observation = this.CurrentState';
if isequal(this.CurrentState, this.TerminalState)
Reward = 10; % Reward for reaching the terminal state
IsDone = true;
else
Reward = -1; % Small negative reward for each step
IsDone = false;
end
end
function InitialObservation = reset(this)
% Reset the environment to the initial state
this.CurrentState = [1, 1];
this.IsFirstStep = true;
% Reset action space to initial action space
this.ActionInfo = rlFiniteSetSpec(1:25);
InitialObservation = this.CurrentState';
end
function actionInfo = getActionInfo(this)
% Method to get the current action space information
actionInfo = this.ActionInfo;
end
end
end
% Define a simple policy function
function action = simplePolicy(state, terminalState, actions)
if state(2) < terminalState(2)
action = actions(4); % Move Right
elseif state(1) < terminalState(1)
action = actions(2); % Move Down
else
action = actions(randi(length(actions))); % Random action if already at the terminal state
end
end
% Create an instance of the CustomGridWorld environment
env = CustomGridWorld();
% Reset the environment to the initial state
initialObservation = env.reset()
numEpisodes = 2; % user defined
numSteps = 8; % user defined
for episode = 1:numEpisodes
% Reset the environment at the start of each episode
initialObservation = env.reset();
disp('Starting a new episode')
disp(['Episode ', num2str(episode), ' started']);
disp(['Initial State: ', mat2str(initialObservation)]);
for step = 1:numSteps
% Get the current action space
actionInfo = env.getActionInfo();
actions = actionInfo.Elements;
if env.IsFirstStep
disp('Taking a random action for the first step');
action = actions(randi(length(actions)));
else
% Get the current state
currentState = env.CurrentState;
% Select an action using the policy
action = simplePolicy(currentState, env.TerminalState, actions);
end
% Take the action and get the next observation, reward, and done flag
[observation, reward, isDone, loggedSignals] = env.step(action);
% Display the results of the step
disp(['Step ', num2str(step)]);
disp(['Action: ', num2str(action)]);
disp(['State: ', mat2str(observation)]);
disp(['Reward: ', num2str(reward)]);
disp(['IsDone: ', num2str(isDone)]);
% If the episode is done, break the loop
if isDone
disp('Episode finished.');
disp('-----------------------');
break;
end
end
end
% Display final observation if the episode didn't finish early
if ~isDone
disp('Final Observation after 5 steps:');
disp(observation);
end
First Step: The action space consists of choosing a starting position on the grid. The initial action space is set to 25 possible actions, corresponding to selecting a starting position on a 5x5 grid using ‘ind2sub’.
Subsequent Steps: The action space consists of choosing a direction (up, down, left, right) to move from the current position.
The ‘CustomGridWorld’ class is designed to handle different action spaces based on whether it is the first step or a subsequent step.
The ‘IsFirstStep’ property is used to check if the current step is the first step of the episode.
The ‘getActionInfo’ returns the agent’ current possible choices using the elements property of ‘rlFiniteSetSpec’ for taking an action. The action space changes once the first step is taken by the agent by setting this.ActionInfo to a new set of actions [1, 2, 3, 4] (up, down, left, right).
I hope this provides an idea about how to change the Action Space in grid-based setting.
To learn about custom class implementation and other functions refer to the following MathWorks documentation:
Thanks.
0 comentarios
Ver también
Categorías
Más información sobre Environments en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!