How to do right code to simulate adaptive optimal control using value iteration

Hello, i am putri. i am a beginner user of matlab, and i try to simulate adaptive optimal control using value iteration.
but i kinda lost and has difficulties to define my initialize value function and code my iteration.
if someone can teach me how to do it, i would be appreciate.
here is my code i try to build. but totally far from define what i want to try to do.
close all;
clc;
clear all;
% To simulate the system
A=[-0.0665 8 0 0; 0 -3.663 3.663 0; -6.86 0 -13.736 -13.736; 0.6 0 0 0]; %matrix A
B=[0; 0; 13.7335; 0]; %matrix B
C=eye(4); %output matrix
D=0;
x=[0;0.1;0;0]; %initial state
% Weights component
R=eye(1);
Q=eye(4);
% Discretize the system
Ts=0.01;%sampling time
sysc=ss(A,B,C,D); %continuouse system
sysd=c2d(sysc,Ts,'zoh'); %discretize with zero order hold
% Solution of Discrete-ARE
[P,K,L] = idare(sysd.A,sysd.B,Q,R,[],[]);
P_DARE=P/100; %correction factor
x0 = [0;0.1;0;0];
t = 0:0.01:6;
x = initial(sysd,x0,t);
% Define the parameters for the value iteration algorithm
gamma = 1; % discount factor
max_iterations = 100; % maximum number of iterations
tolerance = 1e-6; % tolerance for convergence
% Define the state space of the MDP
P = [48.0537812864796 47.7268296813581 6.03847752285629 47.7170406287952;
47.7268296813581 78.9295235076091 12.4038561385049 38.5521293459321;
6.03847752285629 12.4038561385049 5.66971382690978 3.03582894965622;
47.7170406287952 38.5521293459321 3.03582894965622 235.144841887140];
%P = idare(sysd.A,sysd.B,Q,R,[],[]);
% Define the initial weight vector for the value function
W = [P(1,1); 2*P(1,2); 2*P(1,3); 2*P(1,4); P(2,2); 2*P(2,3); 2*P(3,2); P(3,3); 2*P(4,3); P(4,4)];
WP=[W; 0];
% Value Update
x1 = x0(1);
x2 = x0(2);
x3 = x0(3);
x4 = x0(4);
state = [x1 x2 x3 x4]; % define the state
%action= [u(1) u(2) u(3) u(4)]; % define action
% Define the features of the state space
psi = [x1^2; x1*x2; x1*x2; x1*x2; x2^2; x2*x3; x2*x4; x3^2; x3*x4; x4^2];
% Initialize the value function
V = W'*psi;
% Iteration for Paremeter P
% Define the current policy
theta = [x1; x2; x3; x4];
k = idare(sysd.A,sysd.B,Q,R,[],[]);
u = -k'*theta;
h = u;
% Define the reward function
r = Q + u'*R*u;
% Define the dynamics of the MDP
g = [0; 0; 13.7335; 0];
% Initialize a vector to store the value function at each iteration
V_iterations = zeros(max_iterations, 1);
% batch Leaset Square
Fsamples=60; %length of the simulation in samples
T=0.15; % sample time
dphi=[2*x1 0 0 0;
x2 x1 0 0;
x3 0 x1 0;
x4 0 0 x1;
0 2*x2 0 0;
0 x3 x2 0;
0 x4 0 x2;
0 0 2*x3 0;
0 0 x4 x3;
0 0 0 2*x4];
% Iterate until convergence or maximum number of iterations is reached
for k=1:Fsamples
i = 1:max_iterations;
% Compute the value update
V_new = r + gamma * V;
%W_new' == inv(V_new)*psi;
% Compute the policy improvement
h_new = -gamma/2 * R^(-1) * g'*V_new;
% Check for convergence
if norm(V_new - V) < tolerance
break;
end
% Update the value function and policy
V = V_new;
h = h_new;
end
% Plot the value function over the number of iterations
figure;
plot(1:i, V_iterations(1:i));
xlabel('Iteration');
ylabel('Value Function');
seems i confused how to represeant my algorithm into the code
looking forwad for some clue.
Thank you.

4 comentarios

Suggest that you show the mathematical algorithm (equations) of your designed adaptive optimal controller, so that we can check if you code them correctly in MATLAB.
Thank you for your suggestion. Really appreciate it.
I try to do this simulation
Initialize
Select some control policy h_0 (x_k )
( i am using the u0 = -k0*x0)
Value update
W_(j+1)^T∅(x_k )=r(x_k,h_j (x_k ))+W_j^T∅(x_(k+1) )
with the W is weight matrix getting from the parameter P from Discrete-time LQR solution
∅ = psi the basis vector i use least square polinomial solution with 4 state so n = 4 so, the component n(n+1)/2 = 10
r = reward as cost function r = x_k'*Qx_k + u_k'Ru_k
h (x_k) = -K*x_k
K is the feedback gain get from solvinng the discrete system using ARE
iterate until j, converge then
Policy improvement
h_(j+1) (x)=-γ/2 R^(-1) g^T (x_k)∇∅^T (x_(k+1))W_(j+1)
at the end i want to get the system state response graph in value iteration and get the P parameter
hope i can get any help on it.
Thank you again.
Have you solved this problem yet? I'm also having the same problem as you. Can you help me?
Hi @Dinh Tuan, Could you please open a new thread and post your control problem there? Simply click on 'Ask' to initiate the process. Provide the control algorithm and share the code by clicking the indentation icon would allow the users to test and investigate the issue..

Iniciar sesión para comentar.

Respuestas (0)

Categorías

Más información sobre Adaptive Control en Centro de ayuda y File Exchange.

Productos

Versión

R2022a

Preguntada:

el 31 de En. de 2023

Comentada:

el 20 de Mzo. de 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by