How to frame reward functions in Reinforcement Learning that tracks a desired value?

Question

DEBOTRINYA SUR el 12 de En. de 2022

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/1627430-how-to-frame-reward-functions-in-reinforcement-learning-that-tracks-a-desired-value

Respondida: Aditya el 12 de Sept. de 2023

function y = boostgetreward(A)
KK = A(1,1);
if KK >= 80
y = (-100)*abs(KK-80);
else 
    y = -abs(KK - 80);
end

This is the reward function, I thought of but using it I am unable to reach the desired value of 80.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Aditya el 12 de Sept. de 2023

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/1627430-how-to-frame-reward-functions-in-reinforcement-learning-that-tracks-a-desired-value#answer_1307616

ey Debotrinya,

I understand that you are not able to obtain the desired value according to the code snippet that you shared. Here are a few suggestions that might help you.

The reward function you provided penalizes the difference between the value ‘KK’ and the desired value of 80. However, it seems that the penalty is extreme and prevents the system from reaching the required value which is 80.

To modify the reward function and encourage the system to reach 80, you can consider adjusting the penalty or using a different approach. Here are a few suggestions:

Adjust the penalty: Instead of using a linear penalty as in your current function, you can try using a different penalty function that allows for a smoother transition as the value approaches 80. For example, you can use a quadratic penalty or a penalty that decreases as the difference between ‘KK’ and 80 decreases. This can provide a more gradual decrease in the penalty and allow the system to converge towards the desired value.

Use a reward shaping technique: Reward shaping involves adding additional rewards or penalties to guide the learning process. You can introduce intermediate rewards that encourage the system to make progress towards 80. For example, you can give a small positive reward for values closer to 80 and a larger positive reward when the system reaches 80. This can help the system learn the optimal behaviour more effectively.

You may refer to the following Ddocumentation link for more information on non-linear penalty function:

Hyperbolic penalty value for a point with respect to a bounded region - MATLAB hyperbolicPenalty (mathworks.com)

Thanks, and

Best Regards,

Aditya Kaloji

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

How to frame reward functions in Reinforcement Learning that tracks a desired value?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

How to frame reward functions in Reinforcement Learning that tracks a desired value?

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos