Applying large logical masks to large arrays quickly, row wise

Question

George Hackney el 15 de Dic. de 2022

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/1879087-applying-large-logical-masks-to-large-arrays-quickly-row-wise

Comentada: Jan el 17 de Dic. de 2022

I am filtering many large arrays NxM (n>500000, m>10) by applying a NxM mask to generate a many coloumn vectors. It is only possible for the mask to be true in a single column per row so the resultant column vectors will be nx1.

Directly applying the mask to the array is not an option as MATLAB applies the mask in a columnwise fashion. This is undesireable as the rows in the data represent a time stamp.

Is there a quicker way to apply a mask in a row wise fasion or convert a mask to linear indicies.

Example:

%% SETUP
% Data to be filtered
data = ...
    [ 17    24     1     8    15; ...
      23     5     7    14    16; ...
       4     6    13    20    22; ...
      10    12    19    21     3; ...
      11    18    25     2     9 ];
% Mask to be applied
mask = ...
    [ 0 0 1 0 0 ; ...
      0 0 1 0 0 ; ...
      0 1 0 0 0 ; ...
      0 1 0 0 0 ; ...
      1 0 0 0 0  ...
     ];
mask = logical(mask);
%% DESIRED DATA
wanted = [ 1;7;6;12;11]
wanted = 5×1
     1
     7
     6
    12
    11
%% DIRECT APPLICATION OF MASK 
tic
direct = data(mask); % [11;6;12;1;7]
toc
Elapsed time is 0.002715 seconds.
disp(direct)
    11
     6
    12
     1
     7
    
%% CURRENT SLOW SOLOUTION
tic
[row,col] = find(mask);          % This line is slow when called many times with large arrays
[row,sortIdx] = sort(row);
col = col(sortIdx);
idx = sub2ind(size(mask), row, col);
current = data(idx);
toc
Elapsed time is 0.008832 seconds.
disp(current)
     1
     7
     6
    12
    11

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Dyuman Joshi el 15 de Dic. de 2022

Editada: Dyuman Joshi el 15 de Dic. de 2022

Abrir en MATLAB Online

You can flip the data and mask to get the values in the order you want -

% Data to be filtered
data = ...
    [ 17    24     1     8    15; ...
      23     5     7    14    16; ...
       4     6    13    20    22; ...
      10    12    19    21     3; ...
      11    18    25     2     9 ];
% Mask to be applied
mask = ...
    [ 0 0 1 0 0 ; ...
      0 0 1 0 0 ; ...
      0 1 0 0 0 ; ...
      0 1 0 0 0 ; ...
      1 0 0 0 0  ...
     ];
mask = logical(mask);
%% DESIRED DATA
%% DESIRED DATA
wanted = [ 1;7;6;12;11];
disp('current solution')
current solution
timeit(@()currentSol(data,mask))
ans = 2.9746e-05
isequal(wanted,currentSol(data,mask))
ans = logical
   1
disp('using flip')
using flip
timeit(@()flipSol(data,mask))
ans = 5.8705e-06
isequal(wanted,flipSol(data,mask))
ans = logical
   1
function current = currentSol(data,mask)
[row,col] = find(mask);          % This line is slow when called many times with large arrays
[row,sortIdx] = sort(row);
col = col(sortIdx);
idx = sub2ind(size(mask), row, col);
current = data(idx);
end
function solf = flipSol(data,mask)
D=flip(data);
solf=flip(D(flip(mask)));
end

Dyuman Joshi el 16 de Dic. de 2022

I see, it's top to down rather than right to left.

I have a query regarding your comment below, if you don't mind answering @Jan

"The creation of large temporary arrays is expensive."

How to know if something is memory or time (I don't if they are different or not) expensive?

Jan el 17 de Dic. de 2022

Abrir en MATLAB Online

You can simply count, how much memory a code costs. Split the executing into atomic parts, e.g.:

Z = exp(sin(X + Y).^2);  % where X and Y are matrices
% In steps:
A = X + Y;
B = sin(X);
C = B.^2;
Z = exp(C);

Three temporary arrays are required. How expensive this is, depends on the size of the arrays and the CPU. If the array matchs into the CPU caches, there is no latency for waiting for the slow RAM.

How expensive the calculations are, depends in the CPU also and even on the Matlab version. SIN, EXP and the elementwise power can be multi-threaded. Then Matlab distributes the processing to several cores. It depends on the implementation how many cores are used for a specific data size.

There is no general way to predict, how much time is needed for the processing and the memory access. You have to try it. Look at my answers for your question. I've tried 10 different algorithms, but posted the most promissing versions only.

When I write a function, I implement different methods and store them in a unit-test function. On one hand I use them to test the output. On the other hand I can check for new Matlab versions, if the formerly fastest implementation is still the best one.

Don't forget: The time for solving a problem is:

T_all = T_design + T_program + T_debug + T_documentation + T_run

Avoid a pre-mature optimization of the run time. Write stable code at first and then focus on on the bottleneck determined by the profiler.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Jan el 15 de Dic. de 2022

2
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/1879087-applying-large-logical-masks-to-large-arrays-quickly-row-wise#answer_1128927

Editada: Jan el 15 de Dic. de 2022

Abrir en MATLAB Online

Here is a set of different approaches. Run the code to see timings locally on your computer.

n = 500000;
m = 10;
data = rand(n, m);
mask = false(n, m);
mask(sub2ind([n, m], 1:n, randi([1,m], 1, n))) = true;
tic;
for k = 1:100
   y0 = origSol(data, mask);
end
toc
Elapsed time is 3.558161 seconds.
tic;
for k = 1:100
   y1 = transSol(data, mask);
end
toc
Elapsed time is 2.124697 seconds.
assert(isequal(y0, y1), 'Results differ');
tic;
for k = 1:100
  y2 = matchSol(data, mask);
end
toc
Elapsed time is 2.345099 seconds.
assert(isequal(y0, y2), 'Results differ');
tic;
for k = 1:100
  y3 = loopSol(data, mask);
end
toc
Elapsed time is 1.696199 seconds.
assert(isequal(y0, y3), 'Results differ');
% ========================================================================
function data = origSol(data, mask)  % Written by George Hackney
[row, col]     = find(mask);
[row, sortIdx] = sort(row);
col  = col(sortIdx);
idx  = sub2ind(size(mask), row, col);
data = data(idx);
end
function data = transSol(data, mask)  % Transpose both matrices
D    = data.';
data = D(mask.');
end
function R = matchSol(data, mask)  % Leaner version of the original code
[I, J] = find(mask);
R(I)   = data(sub2ind(size(data), I, J));
R      = R(:);
end
function R = loopSol(data, mask)  % C-style loops
[m, n] = size(data);
R      = zeros(m, 1);
for in = 1:n
   for im = 1:m
      if mask(im, in)
         R(im) = data(im, in);
      end
   end
end
end

Transposing large arrays is expensive, such that the speed-up of transSol() is not impressive.

The ugly loop solution is inspired by the C-mex implementation, which is provided in my 2nd answer. Timings on my R2018b, i5-3320M:

% 2.60 sec  Original
% 2.29 sec  transpose data and mask
% 1.83 sec  matchSol(): find(mask)
% 1.13 sec  dull loops
% 0.45 sec  C-mex

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Jan el 16 de Dic. de 2022

Editada: Jan el 16 de Dic. de 2022

Abrir en MATLAB Online

@George Hackney: Yes, you are told to use fast vectorized code. But this is not necessarily the code with the shortest run-time.

There is an easy rule: The creation of large temporary arrays is expensive. In this case:

function R = transSol(data, mask)  % Transpose both matrices
D = data.';  % Expensive
M = mask.';  % Expensive
R = D(M);    % Fairly efficient
end

To my surprise even D(M) can be accelerated by a simple C-mex function: FEX: CopyMask . But the most expensive parts are the explicit transposings of the inputs. CPUs got very fast, even if just one core is used. In comparison, the RAM access can be more expensive than the actual processing. If the processed memory does not fit into the CPU caches, this slows down the processing.

I do love Matlab's elegant vectorized commands. The compact code is nice, easy to maintain and less prone to typos compared to nested loops. But "vectorized" means, that vectors (or in general: arrays) are processed. If this requires large temporary arrays, the speed suffers.

Jan el 16 de Dic. de 2022

Abrir en MATLAB Online

@George Hackney: Another try to improve the speed: Actually processing a matrix cloumnwise is cheaper, because the CPU imports memory in blocks of 64 byte to uts caches (the "cache-line-size"). Therefore accessing neighboring bytes is faster than a random access. Matlab stores arrays in columnwise order, so processing them column-wise is faster.

Inspite of this, the row-wise processing in loopSolRow() is faster, because the inner loop can be stopped after the first match (there is only 1 TRUE per row). This saves the half loop in average, if the TRUEs are randomly distributed.

n = 500000;
m = 10;
data = rand(n, m);
mask = false(n, m);
mask(sub2ind([n, m], 1:n, randi([1,m], 1, n))) = true;
tic;
for k = 1:100
  y3 = loopSol(data, mask);
end
toc
Elapsed time is 1.613006 seconds.
tic;
for k = 1:100
  y3 = loopSolRow(data, mask);
end
toc
Elapsed time is 1.497016 seconds.
% ============================================================
function R = loopSol(data, mask)  % C-style loops, column-wise
[m, n] = size(data);
R      = zeros(m, 1);
for in = 1:n
   for im = 1:m
      if mask(im, in)
         R(im) = data(im, in);
      end
   end
end
end
function R = loopSolRow(data, mask)  % C-style loops, row-wise
[m, n] = size(data);
R      = zeros(m, 1);
for im = 1:m
   for in = 1:n
      if mask(im, in)
         R(im) = data(im, in);
         break;
      end
   end
end
end

Iniciar sesión para comentar.

Answer 2

Jan el 15 de Dic. de 2022

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/1879087-applying-large-logical-masks-to-large-arrays-quickly-row-wise#answer_1128962

Editada: Jan el 15 de Dic. de 2022

Abrir en MATLAB Online

A C-mex version:

// File: maskTCopy.c
// Compile: mex -O -R2018a maskTCopy.c
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
  double *data, *out;
  mxLogical *mask;
  mwSize m, n, im, in;
  
  data = mxGetDoubles(prhs[0]);
  m    = mxGetM(prhs[0]);
  n    = mxGetN(prhs[0]);
  mask = mxGetLogicals(prhs[1]);
  
  plhs[0] = mxCreateDoubleMatrix(m, 1, mxREAL);
  out     = mxGetDoubles(plhs[0]);
  
  for (in = 0; in < n; in++) {
     for (im = 0; im < m; im++) {
        out[im] += *mask++ * *data++;
     }
  }
}

Under my Matlab R2018b this needs 0.39 sec instead of 1.83 sec of matchSol() from my other answer and 2.65 sec of the original version. [EDITED, new version of C code, 045 -> 0.39 sec]

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Applying large logical masks to large arrays quickly, row wise

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Respuesta aceptada

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Más respuestas (1)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Applying large logical masks to large arrays quickly, row wise

4 comentarios Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Respuesta aceptada

5 comentarios Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Más respuestas (1)

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos