Why are parfor loops so inefficient?

2 visualizaciones (últimos 30 días)
Matthew Phillips
Matthew Phillips el 20 de Dic. de 2013
Comentada: Walter Roberson el 21 de Dic. de 2013
In terms of memory, that is. A parfor loop uses vastly more memory than its for-loop counterpart, apparently because it makes copies of all of the data for each thread. But it does this even when the data are read-only, and therefore such copies are completely unnecessary--simultaneous reads of a piece of data from multiple threads are just fine, in general. Moreover, Matlab clearly already knows what data is read only, through its 'classification'. Yet the copies are made anyway. I have lost a lot of time as my system grinds to a halt when trying to run parallelized code on large data files. Is there any way to remedy the situation? Or is it just a programming fail we have to live with (at least for now)?

Respuesta aceptada

Matt J
Matt J el 20 de Dic. de 2013
Editada: Matt J el 20 de Dic. de 2013
It isn't quite the case that copies of all the data are always made. Sliced variables are not copied, nor are distributed variables (used in SPMD). I think you're expected to be partitioning up your computation to take advantage of that.
simultaneous reads from multiple threads are just fine.
Might help if you elaborate on that. My understanding was that threads trying to read from the same memory location is a major problem in parallel computing, because some threads then have to wait idly for their turn at access.
  7 comentarios
Matt J
Matt J el 21 de Dic. de 2013
Unfortunately MATLAB does not offer any tools to force alignment, or to propagate alignment
What if you assign A(I,J) and B(I,J) to cell arrays?
C{1}=A(I,J)
C{2}=B(I,J)
Walter Roberson
Walter Roberson el 21 de Dic. de 2013
No joy, Matt J: each element of a cell array has its own header block that includes a pointer to the data block, and users have no control over the alignment of the data block. If the data blocks are small enough they are going to come out of the "small store" that the memory manager uses instead of coming out of complete system blocks allocated as-needed for larger memory. And if the blocks are large enough to come out of the complete system blocks then they are going to get allocated at the same relative offset, which is often the worst thing for traversing arrays.
After that, when operations are done on the arrays, if the operation is one of a number of common patterns and the arrays are large enough, they are going to be copied for use by BLAS, and there is no control over how they get allocated in that copying.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Parallel for-Loops (parfor) en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by