allExperiences

Return all experiences in replay memory buffer

Since R2022b

Syntax

experiences = allExperiences(buffer)

experiences = allExperiences(buffer,Name=Value)

Description

experiences = allExperiences(buffer) returns all experiences stored in experience buffer buffer as individual experiences, each with a batch size of 1 and a sequence length of 1.

example

experiences = allExperiences(buffer,Name=Value) specifies the type and concatenation of the fields in experience using one or more name-value pair arguments. You can specify whether to return the experiences as dlarray objects or whether to store them in the GPU. You can also return experiences concatenated along the batch dimension or the sequence dimension.

Examples

collapse all

Extract All Experiences from Replay Memory Buffer

Open Live Script

Define observation specifications for the environment. For this example, assume that the environment has two observation channels: one channel with two continuous observations and one channel with a three-valued discrete observation.

obsContinuous = rlNumericSpec([2 1],...
    LowerLimit=0,...
    UpperLimit=[1;5]);
obsDiscrete = rlFiniteSetSpec([1 2 3]);
obsInfo = [obsContinuous obsDiscrete];

Define action specifications for the environment. For this example, assume that the environment has a single action channel with one continuous action in a specified range.

actInfo = rlNumericSpec([2 1],...
    LowerLimit=0,...
    UpperLimit=[5;10]);

Create an experience buffer with a maximum length of 5000.

buffer = rlReplayMemory(obsInfo,actInfo,5000);

Append a sequence of 10 random experiences to the buffer.

for i = 1:10
    experience(i).Observation = ...
        {obsInfo(1).UpperLimit.*rand(2,1) randi(3)};
    experience(i).Action = {actInfo.UpperLimit.*rand(2,1)};
    experience(i).NextObservation = ...
        {obsInfo(1).UpperLimit.*rand(2,1) randi(3)};
    experience(i).Reward = 10*rand(1);
    experience(i).IsDone = 0;
end

append(buffer,experience);

After appending experiences to the buffer, you extract all of the experiences from the buffer. Extract all of the experiences as individual experiences, each with a batch size of 1 and sequence size of 1.

experience = allExperiences(buffer)

experience=10×1 struct array with fields:
    Observation
    Action
    NextObservation
    Reward
    IsDone

Alternatively, you can extract all of the experiences as a single experience batch.

expBatch = allExperiences(buffer,ConcatenateMode="batch")

expBatch = struct with fields:
        Observation: {[2×1×10 double]  [1×1×10 double]}
             Action: {[2×1×10 double]}
             Reward: [9.5751 9.1574 7.4313 8.2346 1.8687 1.6261 5.0596 2.5428 3.5166 5.6782]
    NextObservation: {[2×1×10 double]  [1×1×10 double]}
             IsDone: [0 0 0 0 0 0 0 0 0 0]

Input Arguments

collapse all

`buffer` — Experience buffer
`rlReplayMemory` object | `rlPrioritizedReplayMemory` object | `rlHindsightReplayMemory` object | `rlHindsightPrioritizedReplayMemory` object

Experience buffer, specified as one of the following replay memory objects.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: mode="batch"

`mode` — Concatenation mode
`"none"` (default) | `"batch"` | `"sequence"`

Concatenation mode specified as a one of the following values.

"none" — Return experience as N individual experiences, each with a batch size of 1 and a sequence length of 1.
"batch" — Return experience as a single batch with a sequence length of 1.
"sequence" — Return experience as a single sequence with a batch size of 1.

`ReturnDlarray` — Option to return output as deep learning array
`false` (default) | `true`

Option to return output as deep learning array, specified as a logical value. When you specify ReturnDlarray as true the fields of experience are dlarray objects.

Example: ReturnDlarray=true

`ReturnGpuArray` — Option to return output as GPU array
`false` (default) | `true`

Option to return output as GPU array, specified as a logical value. When you specify ReturnGPUarray as true the fields of experience are stored in the GPU.

Setting this option to true requires both Parallel Computing Toolbox™ software and a CUDA^® enabled NVIDIA^® GPU. For more information on supported GPUs see GPU Computing Requirements (Parallel Computing Toolbox).

You can use gpuDevice (Parallel Computing Toolbox) to query or select a local GPU device to be used with MATLAB^®.

Example: ReturnGpuArray=true

Output Arguments

collapse all

`experiences` — All buffered experiences
structure array | structure

All N buffered experiences, returned as a structure array or structure. When mode is:

"none", experience is returned as a structure array of length N, where each element contains one buffered experience (batchSize = 1 and SequenceLength = 1).
"batch", experience is returned as a structure. Each field of experience contains all buffered experiences concatenated along the batch dimension (batchSize = N and SequenceLength = 1).
"sequence", experience is returned as a structure. Each field of experience contains all buffered experiences concatenated along the batch dimension (batchSize = 1 and SequenceLength = N).

experience contains the following fields.

`Observation` — Observation
cell array

Observation, returned as a cell array with length equal to the number of observation specifications specified when creating the buffer. Each element of Observation contains a D_O-by-batchSize-by-SequenceLength array, where D_O is the dimension of the corresponding observation specification.

`Action` — Agent action
cell array

Agent action, returned as a cell array with length equal to the number of action specifications specified when creating the buffer. Each element of Action contains a D_A-by-batchSize-by-SequenceLength array, where D_A is the dimension of the corresponding action specification.

`Reward` — Reward value
scalar | array

Reward value obtained by taking the specified action from the observation, returned as a 1-by-1-by-SequenceLength array.

`NextObservation` — Next observation
cell array

Next observation reached by taking the specified action from the observation, returned as a cell array with the same format as Observation.

`IsDone` — Termination signal
integer | array

Termination signal, returned as a 1-by-1-by-SequenceLength array of integers. Each element of IsDone has one of the following values.

0 — This experience is not the end of an episode.
1 — The episode terminated because the environment generated a termination signal.
2 — The episode terminated by reaching the maximum episode length.

Version History

Introduced in R2022b

allExperiences

Syntax

Description

Examples

Extract All Experiences from Replay Memory Buffer

Input Arguments

`buffer` — Experience buffer
`rlReplayMemory` object | `rlPrioritizedReplayMemory` object | `rlHindsightReplayMemory` object | `rlHindsightPrioritizedReplayMemory` object

Name-Value Arguments

`mode` — Concatenation mode
`"none"` (default) | `"batch"` | `"sequence"`

`ReturnDlarray` — Option to return output as deep learning array
`false` (default) | `true`

`ReturnGpuArray` — Option to return output as GPU array
`false` (default) | `true`

Output Arguments

`experiences` — All buffered experiences
structure array | structure

`Observation` — Observation
cell array

`Action` — Agent action
cell array

`Reward` — Reward value
scalar | array

`NextObservation` — Next observation
cell array

`IsDone` — Termination signal
integer | array

Version History

See Also

Functions

Objects

allExperiences

Syntax

Description

Examples

Extract All Experiences from Replay Memory Buffer

Input Arguments

buffer — Experience buffer rlReplayMemory object | rlPrioritizedReplayMemory object | rlHindsightReplayMemory object | rlHindsightPrioritizedReplayMemory object

Name-Value Arguments

mode — Concatenation mode "none" (default) | "batch" | "sequence"

ReturnDlarray — Option to return output as deep learning array false (default) | true

ReturnGpuArray — Option to return output as GPU array false (default) | true

Output Arguments

experiences — All buffered experiences structure array | structure

Observation — Observation cell array

Action — Agent action cell array

Reward — Reward value scalar | array

NextObservation — Next observation cell array

IsDone — Termination signal integer | array

Version History

See Also

Functions

Objects

`buffer` — Experience buffer
`rlReplayMemory` object | `rlPrioritizedReplayMemory` object | `rlHindsightReplayMemory` object | `rlHindsightPrioritizedReplayMemory` object

`mode` — Concatenation mode
`"none"` (default) | `"batch"` | `"sequence"`

`ReturnDlarray` — Option to return output as deep learning array
`false` (default) | `true`

`ReturnGpuArray` — Option to return output as GPU array
`false` (default) | `true`

`experiences` — All buffered experiences
structure array | structure

`Observation` — Observation
cell array

`Action` — Agent action
cell array

`Reward` — Reward value
scalar | array

`NextObservation` — Next observation
cell array

`IsDone` — Termination signal
integer | array