Vectorise or Parallel Computing

Question

Mingzhi Shihua el 20 de Nov. de 2019

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/492181-vectorise-or-parallel-computing

Comentada: Mingzhi Shihua el 24 de Nov. de 2019

Can this for loop be vectorized or use parfor instead? If so, how should I do it?

for edgeID = 1:size(IE,1)
    self = selfs(edgeID);
    sdl(self) = sdl(self)+sdl_edge(edgeID); % add frac to self
    res(:,self) = res(:,self)+flux_edge(:,edgeID); % add flux to self residual
end % internal edge iteration ends

"selfs" is an array with some order. That means I want to loop over this "order array" and fill in some value according to that order (not in the order of 12345).

I have tried several ways but failed...

2 comentarios
Mostrar NingunoOcultar Ninguno

Jan el 20 de Nov. de 2019

Editada: Jan el 20 de Nov. de 2019

Some example data would be nice, because then we can test the suggestions without needing to invent inputs. self is not unqiue, isn't it?

Mingzhi Shihua el 21 de Nov. de 2019

Abrir en MATLAB Online

self is something like [2,4,7,8,5,9,1,0,3,6,8,2,5,6]. When I loop over index = 1,2,3,... I need an array in that order 2,4,7,... to update accordingly. Exact variables are uploaded. Initialize sdl and res

sdl = zeros(1,7219);
res = zeros(5,7219);

This indicates that all element in self will not exceed 7219.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Jan el 21 de Nov. de 2019

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/492181-vectorise-or-parallel-computing#answer_402577

Abrir en MATLAB Online

This loop cannot be parallelized. If flux_edge is a vector and not a matrix, accumarray would solve the problem efficiently. Try this:

% UNTESTED and most likey BUGGY!
sdl(selfs)    = accumarray(selfs, sdl_edge);
resCell       = splitapply(@(c) {sum(c, 2)}, flux_edge, selfs);
res(:, selfs) = cat(2, resCell{:});

The values of selfs are missing. Therefore I cannot test the code and I assume, it contains serious bugs. I assume you can find the remaining problems and modify the code until it solves your needs.

If the problem is time-critical (the bottleneck of the total program), I'd write a C-mex function. Accumulating in cells and joining them afterwards is not efficient for the memory consumption.

The size of selfs matters. It might be more efficient to collect the equal values at first by unique and run the loop over this list:

% UNTESTED
v   = unique(selfs);
sdl = zeros(1, 7219);
res = zeros(5, 7219);
for iv = 1:numel(v)
    av         = v(iv);
    mask       = (selfs == av);
    sdl(av)    = sum(sdl_edge(mask));
    res(:, av) = sum(flux_edge(:, mask), 2);
end

If this has a fair speed, you can parallelize it with parfor.

% UNTESTED
v   = unique(selfs);
nv  = numel(v);
A   = zeros(1, nv);
B   = zeros(5, nv);
parfor iv = 1:nv
    av         = v(iv);
    mask       = (selfs == av);
    A(iv)    = sum(sdl_edge(mask));
    B(:, iv) = sum(flux_edge(:, mask), 2);
end
sdl       = zeros(1, 7219);
sdl(v)    = A;
res       = zeros(5, 7219);
res(:, v) = B;

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Jan el 23 de Nov. de 2019

Sorry, the original code needs 0.002 seconds on my R2018b system. I do not see a way to accelerate this substantially, because this very fast already. My suggestion solutions are ways slower than the original approach.

Do you work with much larger problems than the posted data?

Mingzhi Shihua el 24 de Nov. de 2019

Actually the whole code involves more than 20000 times repetition of that part. Of course, self_edge and flux_edge changes each time. But I think, yes, you are right, the most time consuming part should be updating those two.

Iniciar sesión para comentar.

Vectorise or Parallel Computing

2 comentarios
Mostrar NingunoOcultar Ninguno

Respuesta aceptada

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Vectorise or Parallel Computing

2 comentarios Mostrar NingunoOcultar Ninguno

Respuesta aceptada

4 comentarios Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

2 comentarios
Mostrar NingunoOcultar Ninguno

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos