How to to updata episodes number?

3 visualizaciones (últimos 30 días)

Mostrar comentarios más antiguos

ryunosuke tazawa el 10 de Ag. de 2021

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/895612-how-to-to-updata-episodes-number

I am making code by reinforcement learning.

The purpose of reinforcement learning describes a simple pendulum that throws a ball at a target point.

However, the figure below shows the learning situation.

I feel that there is a problem with the episode reward.

Is this because the episodes haven't been updated, that is, the observations haven't been updated?

Or is there some other cause?

Below is the code for the update of the observed values.

 function [Observation,Reward,IsDone,LoggedSignals] = step(this,Action)
            
            LoggedSignals = [];
            
            Force = getForce(this,Action);　　　　　　　　　　　% torque
            
            theta = this.State(1);                             % state is pendulum's theta (angular)
            w = this.State(2);                                 % w is Angular velocity of the pendulum
            
       
            IsDone = false;
            
            R = 0;
            
            
            % pendulum dynamics Euler method
            
           
            q2 =  w - (this.g/this.L) *theta*this.Ts- this.b * this.Ts-Force*this.Ts; % angular velocity
            q1 = theta + w * this.Ts;                                                 % angular 
            
            % ball dynamics
            ball_x = this.L * sin(q1);          % x initial position of ball 
            ball_y = -this.L * cos(q1);         % y initial position of ball
            ball_time = sqrt(2*abs(ball_y)/9.8);      % reaching time of ball
            ball_reach = ball_x +abs(q2).*ball_time;  % Horizontal ball flight distance
          
            ball_gosa = ball_reach-this.Target;　　　　% Difference between target point and flight distance　　　
            q3 = ball_gosa;　　　　　　　　　
            
            
            % condition of reward
            % If the difference between the target point and the flight distance is 1 or less, a reward will be given.
            if  0 < q3 && q3 < 1
                IsDone = true;
                R = this.RewardForStrike;
            else
                R = this.RewardForNotFalling;
            end
           
            
            Observation = [q1 q2 q3 Force]';               % observation states
            
            this.State = Observation;
            
            this.IsDone = IsDone;
            
            Reward = getReward(this,R);
            
            notifyEnvUpdated(this);
            
        end

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Respuestas (0)

Iniciar sesión para responder a esta pregunta.

Categorías

Computational Finance Financial Toolbox Price and Analyze Financial Instruments Price Fixed-Income Instruments

Más información sobre Price Fixed-Income Instruments en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

How to to updata episodes number?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

How to to updata episodes number?

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos