How to format sequences to store in experience buffer for DRQN?

7 visualizaciones (últimos 30 días)
Imola Fodor
Imola Fodor el 27 de Feb. de 2024
Comentada: Imola Fodor el 4 de Jun. de 2024
For DRQN (Deep Recurrent Q Learning) in POMDP it is needed to store entire sequences instead of individual transitions in the replay buffer. For the object agent.ExperienceBuffer, how to construct the data? For example, for Observation element i have tried to have a 1x1 cell with inside the numchannel x sequencelength, and also to have a cell array directly numchannel x sequencelength. the idea was to then sample minibatch of sequences instead of minibatch of transitions.
For any trial I get an error
Error using rl.replay.rlReplayMemory/validateExperience
Observation dimensions must match the dimensions specified in the corresponding specifications.
More specifically, when debugging i see that in the first case (1x1 cell) the code crashes at :
for obsCh = 1:numObsChannels
if ~all(size(NewObs{obsCh}) == obj.InternalReplayMemory_.ObservationDimension{obsCh})
error(message('rl:general:errIncorrectObservationDim'));
end
And in the second case at:
if numObsChannels ~= numel(NewObs)
error(message('rl:general:errIncorrectObservationDim'));
end
In MATLAB it is possible to have dqn with recurrent layers, so there is certainly a way to store these sequences somehow.
Thank you,
Imola

Respuestas (1)

Shubham
Shubham el 29 de Mayo de 2024
Hi Imola,
To handle sequences in the replay buffer for Deep Recurrent Q-Networks (DRQN) within a Partially Observable Markov Decision Process (POMDP) setting in MATLAB, you need to structure your observations and experiences in a way that aligns with the expected format of the rl.ExperienceBuffer or any custom replay buffer you're implementing. The error you're encountering is due to a mismatch in the dimensions of the observations you're trying to store versus what the replay memory expects based on the observation space specifications.
Here's how you can approach this:
1. Observation and Action Space Specification
First, ensure that your observation and action spaces are correctly specified to accommodate sequences. For a DRQN, the observation space must account for the sequence length as part of its dimensionality if you're not using a 1x1 cell to encapsulate the entire sequence.
2. Storing Sequences
When storing sequences, the key is to maintain consistency in how observations are represented. If your environment's observation for a single timestep is a vector of size [numChannels, 1], then for a sequence of length sequenceLength, you'd typically have an observation of size [numChannels, sequenceLength].
However, MATLAB's RL framework expects each observation to be encapsulated in a cell array where each cell corresponds to one "channel" or dimension of the observation space. For sequence data, you need to ensure that the entire sequence for a single channel is contained within a single cell, and the dimensions match what the environment and agent expect.
3. Correct Approach for Sequences
Given the errors you're encountering, let's clarify the correct approach:
  • For a 1x1 Cell Approach: If you're trying to encapsulate the entire sequence in a 1x1 cell, ensure that the cell contains a matrix where each column represents a timestep, and the rows represent different features or channels of the observation. This approach might require custom handling in your experience replay mechanism to correctly sample and utilize these sequences.
  • For a Cell Array Directly Matching numChannel x sequenceLength: This seems to be a misunderstanding. If you're using a cell array where each cell is supposed to represent a channel over the sequence, ensure that each cell actually contains a vector representing the sequence for that channel. The correct dimensionality for a cell array storing sequences would be [1, numChannels] where each cell contains a vector of length sequenceLength, not a matrix of [numChannels, sequenceLength].
4. Sampling Mini-batches
When sampling mini-batches of sequences, you must ensure that each sampled experience contains the full sequence as required for the DRQN's input. This might involve custom modifications to the sampling logic to ensure that sequences are kept intact and not broken up.
5. Debugging Tips
  • Check Dimensionality at Every Step: Print out the dimensions of your observations at various points (creation, before storing, and during retrieval) to ensure they match expectations.
  • Align with Agent Specifications: Double-check the agent's expected input dimensions, especially if you're using recurrent layers, to ensure compatibility.
  • Custom Replay Buffer: If the built-in rl.ExperienceBuffer doesn't meet your needs for sequence handling, consider implementing a custom replay buffer that explicitly supports sequences in the way you require.
Remember, the key to successfully implementing DRQN in MATLAB is ensuring that your observation sequences are correctly formatted and that your replay buffer is capable of handling, storing, and sampling these sequences in a way that aligns with the expected input structure of your recurrent neural network.
  1 comentario
Imola Fodor
Imola Fodor el 4 de Jun. de 2024
hello Shubham, this answer is very long but unfortunately i dont see any concrete solutions.. Can you point me to some documentation where I can read about "...each observation to be encapsulated in a cell array where each cell corresponds to one "channel" or dimension of the observation space. For sequence data, ..."? Another thing, i see staterments such as "This might involve custom modifications to the sampling logic " or "This approach might require custom handling in your experience replay mechanism "...

Iniciar sesión para comentar.

Productos


Versión

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by