Main Content

syncParameters

Modify the learnable parameters of one approximator towards the learnable parameters of another approximator

Since R2022a

    Description

    zFcnAppx = syncParameters(xFcnAppx,yFcnAppx,smoothFactor) returns an updated function approximator object of the same type and configuration of xFcnAppx, but with its learnable parameters updated towards yFcnAppx, according to the smooth factor smoothFactor.

    example

    Examples

    collapse all

    For this example, create two value function critics and sync their parameters.

    First, create an finite set observation specification for a scalar that can have four different values.

    obsInfo = rlFiniteSetSpec(1:4);

    Create a table object. Table values are initialized to zero by default.

    table = rlTable(obsInfo);

    Create a base critic.

    Vx = rlValueFunction(table,obsInfo);

    Set the table values to different values.

    table.Table = [1 -1 -10 100]';

    Use the updated table to create a new critic.

    Vy = rlValueFunction(table,obsInfo);

    Sync the parameter values of the base critic Vx, moving them by one fifth of the way towards the parameter values of the new critic Vy.

    Vz = syncParameters(Vx,Vy,0.2);

    Display the learnable parameters of the new critic Vz.

    Vz.Learnables{1}
    ans = 
      4x1 dlarray
    
        0.2000
       -0.2000
       -2.0000
       20.0000
    
    

    Input Arguments

    collapse all

    Base function approximator object, specified as one of the following:

    To create an actor or critic function object, use one of the following methods.

    • Create the function approximator object directly.

    • Obtain the existing critic from an agent using getCritic.

    • Obtain the existing actor from an agent using getActor.

    New actor or critic object, specified as a function approximator object with a parameter cell array having the same dimensions as the one of xFcnAppx.

    Smooth factor, specified as a positive scalar smaller than one. This factor regulates the extent to which the parameters of xFcnAppx are updated towards the parameters of yFcnAppx. This operation is akin to a single step of a first order low-pass filter update on the xFcnAppx learnable parameters.

    Specifically, if Pz is the parameter vector of zFcnAppx, then:

    Pz = sPy + (1-s)Px

    where Py and Px are the parameter vectors of yFcnAppx and xFcnAppx, respectively.

    For example, if you use a smooth factor of 1, the parameters of zFcnAppx are equal to the parameters of yFcnAppx. If you use a smooth factor of 0.5, parameters of zFcnAppx are equal to the average between the parameters of yFcnAppx and xFcnAppx.

    Output Arguments

    collapse all

    Updated target actor or critic object, returned as a function approximator object of the same type as xFcnAppx. The learnable parameter values of zFcnAppx are set as a convex combination between the ones in xFcnAppx and the ones in yFcnAppx. For example, as specified in the description of smoothFactor, using a smooth factor of 1 results in zFcnAppx parameters equal to yFcnAppx parameters, while using a smooth factor of 0.5 results in zFcnAppx parameters equal to the average between parameters in xFcnAppx and yFcnAppx.

    Version History

    Introduced in R2022a