optimal path problem using q-learning algorithm

30 visualizaciones (últimos 30 días)
Aristi Christoforou
Aristi Christoforou el 24 de Jun. de 2022
Respondida: Pratik el 14 de Nov. de 2023
hello i want to find the optimal path for my target to the nearest corner of an environment with 3 obstacles using q-learning. For some reason , the code works only for the left up corner and the right up corner of the environment and it goes into an infinite loop in some causes when the nearest corner is the left down corner. In general, in the left and right down corners, the agent for some reason prefers to go a step up and thats why it takes many actions that arent the optimal. I dont know what i can do to fix that problem:( i would really appreciate if anyone could help me find my mistake in order to find the optimal path.
n = 12;
dim=n*n;
maze = ones(n,n);
%%
%{
variables:
r=row of agent
c=column of agent
n=number of rows and columns
rt=row of thieve
ct=column of thieve
ruc=thieve's distance from the right up-corner (1,n)
rdc=thieve's distance from the right down-corner (n,n)
luc=thieve's distance from the left up-corner (1,1)
ldc=thieve's distance from the left down-corner (n,1)
rpt=row of path position of the thieve in state path(i)
goalt=position of goal (perno ena apo ta 4 akra tu maze ke to metatrepo se
mia sintetagmeni thesis pinaka using the tool 1
rewardt=reward of thieve
distgt=goal of thieve(distance number : poso makria ine i pio kontini
apostasi apo tin korifi)
cornerofgoal= corner of the best distance of the thieve (will give the
position of the optimal corner : ruc=1 rdc=2 luc=3 ldc=4
q=q value
gamma=discount factor
alpha=factor learning rate
maxItr=max num of iterations
csa=current state of agent
nsa= next state of agent
cst=current state of thieve
nst= next state of thieve
n_actions=possible agent's actions for the chosen state
distta=distance between thieve and agent
fores,ypol= dieresi gia na vro to row ke to column tis kathe thesis tu
thieve sto path tu
%}
%% creating maze
for i=1:3
x=randi([1,n]); %row
y=randi([1,n]); %column
while (x==1 && y==1) || (x==1 && y==n) || (x==n && y==1) || (x==n && y==n)
x=randi([1,n]);
y=randi([1,n]);
end
maze(x,y)=-50;
end
%disp(maze)
n=length(maze);
figure
imagesc(maze)
colormap(spring)
colorbar
set(gca,'xtick',0:1:n)
set(gca,'ytick',0:1:n)
grid on
%position randomly the agent
r=randi([1,n]);
c=randi([1,n]);
while maze(r,c)<-30
r=randi([1,n]);
c=randi([1,n]);
end
text(c,r,'\bullet','Color','cyan','FontSize',28,'HorizontalAlignment','center')
%position randomly the thieve
rt=randi([1,n]);
ct=randi([1,n]);
while maze(rt,ct)<-30
rt=randi([1,n]);
ct=randi([1,n]);
end
disp(rt)
disp(ct)
text(ct,rt,'\bullet','Color','red','FontSize',28,'HorizontalAlignment','center')
%% finding the distances of the position of the thieve from each corner
ruc=norm([rt ct]-[1 n])
rdc=norm([rt ct]-[n n])
luc=norm([rt ct]-[1 1])
ldc=norm([rt ct]-[n 1])
%% Setting goal for thieve
goalt=0;
[distgt,cornerofgoal]=min([ruc rdc luc ldc])
if cornerofgoal==1
goalt=n
elseif cornerofgoal==2
goalt=n*n
elseif cornerofgoal==3
goalt=1
elseif cornerofgoal==4
goalt=1+n*(n-1)
end
%%
%find the path of the thieve (best path)
%{
for every thieve's action i will find the reward
creating reward matrix for the maze(for the thieve):
Up (i-n)
Down (i+n)
Left (i-1)
Right (i+1)
Reward is -Inf (~No reward) for any other actions. Thus any other action other then above action will not occur.
%}
if distgt==0
x=['the thieve has won,he was already in the corner ',num2str(cornerofgoal),' of the maze'];
disp(x)
else
rewardt=zeros(n*n);
for i=1:dim %giati hriazete?
rewardt(i,:)=reshape(maze',1,n*n); %ns about n*n
end
for i=1:dim
for j=1:dim
if j~=i-n && j~=i+n && j~=i-1 && j~=i+1 %if the action isn't one of the above then the reward is -inf
rewardt(i,j)=-Inf;
end
end
end
for i=1:n:dim %tha prepi na figo to j=nw,sw?
for j=1:i+n
if j==i-1
rewardt(i,j)=-Inf;
rewardt(j,i)=-Inf;
end
end
end
%% Q-Learning algorithm
q = randn(size(rewardt));
gamma = 0.95;
alpha = 0.2;
maxItr = 50;
for i=1:maxItr
% Starting from start position
cs=ct+n*(rt-1); %convert position of the thieve into n*n reward matrix
% Repeat until Goal state is reached
while(1)
% possible actions for the chosen state
n_actions = find(rewardt(cs,:)>=0);
% choose an action at random and set it as the next state
ns = n_actions(randi([1 length(n_actions)]));
% find all the possible actions for the selected state
n_actions = find(rewardt(ns,:)>=0);
% find the maximum q-value i.e, next state with best action
max_q = 0;
for j=1:length(n_actions)
max_q = max(max_q,q(ns,n_actions(j)));
end
% Update q-values as per Bellman's equation
q(cs,ns)=rewardt(cs,ns)+gamma*max_q;
% Check whether the episode has completed i.e Goal has been reached
if(cs == goalt)
break;
end
% Set current state as next state
cs=ns;
end
end
%% Solving the maze
% * Starting from the first postion
start =ct+n*(rt-1);
move = 0;
path = start;
%%
% * Iterating until Goal-State is reached
while(move~=goalt)
[~,move]=max(q(start,:));
% Deleting chances of getting stuck in small loops (upto order of 4)
if ismember(move,path)
[~,x]=sort(q(start,:),'descend');
move=x(2);
if ismember(move,path)
[~,x]=sort(q(start,:),'descend');
move=x(3);
%{
if ismember(move,path)
[~,x]=sort(q(start,:),'descend');
move=x(4);
if ismember(move,path)
[~,x]=sort(q(start,:),'descend');
move=x(5);
end
end
%}
end
end
% Appending next action/move to the path
path=[path,move];
start=move;
end
%% Solution of maze
% i.e, Optimal Path between START to GOAL
%%
fprintf('Final path: %s',num2str(path))
fprintf('Total steps: %d',length(path))
% reproducing path to matrix path
pmat=zeros(n,n);
[q, r]=quorem(sym(path),sym(n));
q=double(q);r=double(r);
q(r~=0)=q(r~=0)+1;r(r==0)=n;
for i=1:length(q)
pmat(q(i),r(i))=50;
end
%% Final Plot of the maze
%%
figure
imagesc(pmat)
colormap(white)
for i=1:n
for j=1:n
if maze(i,j)==min(maze)
text(j,i,'X','HorizontalAlignment','center')
end
if pmat(i,j)==50
text(j,i,'\bullet','Color','red','FontSize',28)
end
end
end
hold on
imagesc(maze,'AlphaData',0.2)
colorbar
colormap(winter)
grid on
set(gca,'xtick',0:1:n)
set(gca,'ytick',0:1:n)
hold off
end
  4 comentarios
Sam Chak
Sam Chak el 24 de Jun. de 2022
Not yet, @Aristi Christoforou. I'm still studying the code.
Which parameters influence the learning process most strongly?
Aristi Christoforou
Aristi Christoforou el 24 de Jun. de 2022
Editada: Aristi Christoforou el 24 de Jun. de 2022
gamma , alpha,rewardt and maxItr.. the problem is that for some reason the learning process always start with a step up which is not ideal if the corner of goal is rdc or ldc. Thanks for the help in advance @Sam Chak . My solution is based on this example https://www.mathworks.com/matlabcentral/fileexchange/67989-q-learning-example

Iniciar sesión para comentar.

Respuestas (1)

Pratik
Pratik el 14 de Nov. de 2023
Hi Aristi,
As per my understanding, you want to get the most optimal path to the nearest corner in the grid avoiding the obstacle using Q-learning. However, the algorithm is not able to find the most optimal path and gets stuck in infinite loop in some cases.
The example on which this code is based on aims to solve a maze and finding the most optimal path is not it’s primary objective. To find the most optimal path following methods can be used:
1. Assigning a negative reward to each step in the path, thus when trying to increase the reward, the agent will find optimal path.
2. Current code chooses next action randomly from set of possible actions. An epsilon-greedy strategy can be implemented which balances exploration and exploitation.
3. Experimenting with different values of learning rate and discount factor and increasing the number of iterations can also improve the performance.
Infinite loop can occur in the code when the algorithm is stuck in sub-optimal loop where best action in state ‘A’ goes to state ‘B’ and best action in state ‘B’ goes to state ‘A’. To avoid this scenario, path can be tracked, and same state can be avoided by giving negative reward.
Hope this helps!

Categorías

Más información sobre Labyrinth problems en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by