Parfor overhead: local cores vs. cluster core

1 visualización (últimos 30 días)
Brandon
Brandon el 19 de Mayo de 2021
Comentada: Edric Ellis el 20 de Mayo de 2021
I have a parfor loop that takes as inputs data from a very large cell array, where all elements of the cell array are eventually used over the loop This process takes about 150 seconds when computed on 20 local cores, but about 500 seconds when computed on 20 clustered cores (I have 100 on the cluster, for which I would like to use for scaling).
Two questions:
1) Is it safe to assume that this time difference is due to network communication latency?
2) If the answer to (1) is yes, then is there any way to send the data in the cell array in a more efficient way ? As a highly simplified example of what I currently have:
for model_it = 1:100
% some operations to create cell1, which is of length k.
parfor ih=1:k
temp=cell1{ih}
out = f(temp); % some operations done to temp
output_store{ih} = out;
end
% some operations that use output_store to create inputs to for cell1 on the next model_it
end
I do not believe parallel.pool.Constant is an option here because the data in cell1 changes every model iterations. Do I have other options for setting up this problem?
  1 comentario
Edric Ellis
Edric Ellis el 20 de Mayo de 2021
Try using ticBytes and tocBytes to see just how much data is being sent. Is there any way you can invert things to run parfor as the outer loop?

Iniciar sesión para comentar.

Respuestas (0)

Categorías

Más información sobre Parallel Computing Fundamentals en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by