Normalization across time dimension in "Sequence-to-Sequence Classification Using 1-D Convolutions"

2 visualizaciones (últimos 30 días)
I have been exploring the following example for 1D temporal convolutions:
My questions relate to the following section of code within the residualBlock function:
% Normalization.
dim = find(dims(dlY)=='T');
mu = mean(dlY,dim);
sigmaSq = var(dlY,1,dim);
epsilon = 1e-5;
dlY = (dlY - mu) ./ sqrt(sigmaSq + epsilon);
My understanding is the normalisation parameters of mu and sigmaSq are calculated using all time steps in the given sequence - is this correct?
If so, does this mean the model is not fully causal, as the value at timestep t is influenced by the value at t+1, t+2 etc during the normalisation step? This is the case for both training and testing in the example.
I understand the example states: "In this context, "causal" means that the activations computed for a particular time step cannot depend on activations from future time steps" indicating causality does not neccessarily apply to every step of the model.
Would this normalisation method be incompatible then with e.g. real-time classification, where we have data for time t (and prior timesteps) and wanted to predict which human activity was currently taking place, but do not yet have t+1, t+2...? We could only calculate the mu and sigmaSq from [t0....tn-1, t], so a different normalisation method would be needed for such a use case?

Respuestas (1)

Aditya
Aditya el 4 de Jun. de 2024
Yes, your understanding is correct on several points. Let's break down the implications of the code snippet you've provided, especially in the context of causality and real-time processing:
  1. Normalization Across Time Steps: The normalization parameters (mu and sigmaSq) are indeed calculated using all time steps in the given sequence. The dimension dim is identified where the 'T' (time) dimension of the dlY (deep learning array or tensor) exists, and then mean and var are computed along this dimension. This means that for any given time step t, the normalization is influenced by all time steps, including future ones (t+1, t+2, etc.).
  2. Causality Concerns: As you've correctly identified, this approach breaks the causality principle for models where the output at time t should only depend on inputs from time t and earlier. In scenarios where future data points (t+1, t+2, etc.) influence the normalization at time t, the model is not fully causal. This is particularly relevant in streaming or real-time applications where future inputs are not available.
  3. Real-Time Classification: For real-time classification tasks, such as predicting human activity at the current moment based on past and present data, the described normalization method would indeed be incompatible. Since you only have access to data up to the current time step t, you cannot use future data points for normalization as they are simply not available.
In summary, for real-time or causal processing tasks, the normalization method used in the code snippet would need to be replaced or adapted to ensure that predictions at time t are not influenced by data from future time steps. Layer normalization or other causal normalization techniques would be more appropriate choices for such use cases.

Categorías

Más información sobre Statistics and Machine Learning Toolbox en Help Center y File Exchange.

Productos


Versión

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by