Main Content

allExperiences

Return all experiences in replay memory buffer

Since R2022b

    Description

    example

    experiences = allExperiences(buffer) returns all experiences stored in experience buffer buffer as individual experiences, each with a batch size of 1 and a sequence length of 1.

    experience = allExperiences(buffer,ConcatenateMode=mode) returns experiences concatenated along the dimension specified by mode. You can concatenate experiences along the batch dimension or the sequence dimension.

    Examples

    collapse all

    Define observation specifications for the environment. For this example, assume that the environment has two observation channels: one channel with two continuous observations and one channel with a three-valued discrete observation

    obsContinuous = rlNumericSpec([2 1],...
        LowerLimit=0,...
        UpperLimit=[1;5]);
    obsDiscrete = rlFiniteSetSpec([1 2 3]);
    obsInfo = [obsContinuous obsDiscrete];

    Define action specifications for the environment. For this example, assume that the environment has a single action channel with one continuous action in a specified range.

    actInfo = rlNumericSpec([2 1],...
        LowerLimit=0,...
        UpperLimit=[5;10]);

    Create an experience buffer with a maximum length of 5,000.

    buffer = rlReplayMemory(obsInfo,actInfo,5000);

    Append a sequence of 10 random experiences to the buffer.

    for i = 1:10
        experience(i).Observation = ...
            {obsInfo(1).UpperLimit.*rand(2,1) randi(3)};
        experience(i).Action = {actInfo.UpperLimit.*rand(2,1)};
        experience(i).NextObservation = ...
            {obsInfo(1).UpperLimit.*rand(2,1) randi(3)};
        experience(i).Reward = 10*rand(1);
        experience(i).IsDone = 0;
    end
    
    append(buffer,experience);

    After appending experiences to the buffer, you extract all of the experiences from the buffer. Extract all of the experiences as individual experiences, each with a batch size of 1 and sequence size of 1.

    experience = allExperiences(buffer)
    experience=10×1 struct array with fields:
        Observation
        Action
        NextObservation
        Reward
        IsDone
    
    

    Alternatively, you can extract all of the experiences as a single experience batch.

    expBatch = allExperiences(buffer,ConcatenateMode="batch")
    expBatch = struct with fields:
            Observation: {[2x1x10 double]  [1x1x10 double]}
                 Action: {[2x1x10 double]}
                 Reward: [9.5751 9.1574 7.4313 8.2346 1.8687 1.6261 5.0596 2.5428 3.5166 5.6782]
        NextObservation: {[2x1x10 double]  [1x1x10 double]}
                 IsDone: [0 0 0 0 0 0 0 0 0 0]
    
    

    Input Arguments

    collapse all

    Experience buffer, specified as one of the following replay memory objects.

    Concatenation mode specified as a one of the following values.

    • "none" — Return experience as N individual experiences, each with a batch size of 1 and a sequence length of 1.

    • "batch" — Return experience as a single batch with a sequence length of 1.

    • "sequence" — Return experience as a single sequence with a batch size of 1.

    Output Arguments

    collapse all

    All N buffered experiences, returned as a structure array or structure. When mode is:

    • "none", experience is returned as a structure array of length N, where each element contains one buffered experience (batchSize = 1 and SequenceLength = 1).

    • "batch", experience is returned as a structure. Each field of experience contains all buffered experiences concatenated along the batch dimension (batchSize = N and SequenceLength = 1).

    • "sequence", experience is returned as a structure. Each field of experience contains all buffered experiences concatenated along the batch dimension (batchSize = 1 and SequenceLength = N).

    experience contains the following fields.

    Observation, returned as a cell array with length equal to the number of observation specifications specified when creating the buffer. Each element of Observation contains a DO-by-batchSize-by-SequenceLength array, where DO is the dimension of the corresponding observation specification.

    Agent action, returned as a cell array with length equal to the number of action specifications specified when creating the buffer. Each element of Action contains a DA-by-batchSize-by-SequenceLength array, where DA is the dimension of the corresponding action specification.

    Reward value obtained by taking the specified action from the observation, returned as a 1-by-1-by-SequenceLength array.

    Next observation reached by taking the specified action from the observation, returned as a cell array with the same format as Observation.

    Termination signal, returned as a 1-by-1-by-SequenceLength array of integers. Each element of IsDone has one of the following values.

    • 0 — This experience is not the end of an episode.

    • 1 — The episode terminated because the environment generated a termination signal.

    • 2 — The episode terminated by reaching the maximum episode length.

    Version History

    Introduced in R2022b