beta distribution in PPO

Sourabh

2 Feb. 2024

0 Respuestas

Actualizado a las 15 Feb. 2024

15 Visualizaciones (30 días)

Iniciar sesión para responder a esta pregunta.

Iniciar sesión para seguir la actividad

Iniciar sesión para responder a esta pregunta.

Iniciar sesión para seguir la actividad

Mostrar comentarios más antiguos

0 votos

I want to confine the actions of my PPO algorithm and I was thinking whether or not I can implement beta distribution for my PPO algorithm to confine my action space somehow.

heres the script of networks i am using

----------

commonPath = [

featureInputLayer(prod(obsInfo.Dimension),Name="comPathIn")

fullyConnectedLayer(120)

tanhLayer

fullyConnectedLayer(1,Name="comPathOut")

];

% Define mean value path

meanPath = [

fullyConnectedLayer(64,Name="meanPathIn")

tanhLayer

fullyConnectedLayer(64,Name="fc_2")

tanhLayer

fullyConnectedLayer(prod(actInfo.Dimension))

leakyReluLayer(0.1,Name="meanPathOut")

];

% Define standard deviation path

sdevPath = [

fullyConnectedLayer(64,"Name","stdPathIn")

tanhLayer

fullyConnectedLayer(64)

tanhLayer

fullyConnectedLayer(prod(actInfo.Dimension));

softmaxLayer(Name="stdPathOut")

];

% Add layers to layerGraph object

actorNet = layerGraph(commonPath);

actorNet = addLayers(actorNet,meanPath);

actorNet = addLayers(actorNet,sdevPath);

% Connect paths

actorNet = connectLayers(actorNet,"comPathOut","meanPathIn/in");

actorNet = connectLayers(actorNet,"comPathOut","stdPathIn/in");

actorNetwork = dlnetwork(actorNet);

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Kautuk Raj el 15 de Feb. de 2024

To implement a Beta distribution for the action outputs in the PPO algorithm, I think we would need to modify the network architecture to output the parameters (alpha and beta) of the Beta distribution. These parameters must be positive, so one would typically use an activation function that ensures positivity, such as the softplus function.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Iniciar sesión para seguir la actividad