Creating an actorLossFunction for ContinuousDeterministicActor
3 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
rtn
el 24 de Mayo de 2022
Respondida: Takeshi Takahashi
el 2 de Jun. de 2022
Hi in the example the actor loss function is the following for a rlDiscreteCategoricalActor
function loss = actorLossFunction(policy, lossData)
policy = policy{1};
% Create the action indication matrix.
batchSize = lossData.batchSize;
Z = repmat(lossData.actInfo.Elements',1,batchSize);
actionIndicationMatrix = lossData.actionBatch(:,:) == Z;
% Resize the discounted return to the size of policy.
G = actionIndicationMatrix .* lossData.discountedReturn;
G = reshape(G,size(policy));
% Round any policy values less than eps to eps.
policy(policy < eps) = eps;
% Compute the loss.
loss = -sum(G .* log(policy),'all');
end
Here is my
actInfo =
rlNumericSpec with properties:
LowerLimit: [2×1 double]
UpperLimit: [2×1 double]
Name: "CartPole Action"
Description: [0×0 string]
Dimension: [2 1]
DataType: "double"
obsInfo =
rlNumericSpec with properties:
LowerLimit: -Inf
UpperLimit: Inf
Name: "CartPole States"
Description: "pendulum_force, cart position, cart velocity"
Dimension: [4 1501]
DataType: "double"
Here is how I set my actor
actor = rlContinuousDeterministicActor(actorNet,obsInfo,actInfo);
actor = accelerate(actor,true);
actorOpts = rlOptimizerOptions('LearnRate',1e-3);
actorOptimizer = rlOptimizer(actorOpts);
To create my loss function can I do the following?
function loss = actorLossFunction(policy, lossData)
policy = policy{1};
% Create the action indication matrix.
batchSize = lossData.batchSize;
Z = repmat(lossData.actInfo.Dimension(1)',1,batchSize);
actionIndicationMatrix = lossData.actionBatch(:,:) == Z;
% Resize the discounted return to the size of policy.
G = actionIndicationMatrix .* lossData.discountedReturn;
G = reshape(G,size(policy));
% Round any policy values less than eps to eps.
policy(policy < eps) = eps;
% Compute the loss.
loss = -sum(G .* log(policy),'all');
end
0 comentarios
Respuesta aceptada
Takeshi Takahashi
el 2 de Jun. de 2022
Please take a look at this example for rlContinuousDeterministicActor if you want to use it in a custom training loop.
rlDiscreteCategoricalActor is for stochastic discrete actions while rlContinuousDeterministicActor is for deterministic continuous actions. You need different formulations.
0 comentarios
Más respuestas (0)
Ver también
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!