Why does the number of workers decrease while running parfor?

16 visualizaciones (últimos 30 días)
한범
한범 el 20 de Jun. de 2022
Comentada: Edric Ellis el 21 de Jun. de 2022
I used parallel computing in order to increase the calculation speed.
At the beginning, I defined mypool = parpool('local',3); and began the code. I expected that it would take a little more than 50 hours to finish the job.
However, it has been already 3 days after the beginning. I tried to check why this is happening. Then I found "Number of workers: 1". (I found it by putting my cursor on the '4 green vertical lines at the left bottom corner of the MATLAB window.') I am sure it was 3 at the beginning.
Why does this happen? If the number of worker decreases automatically, there is no use of paralled computing.
Can it be related to the memory? The job deals with huge files, and when I tried to use more than 3 workers, there occured memory allocation problem below.
Unexpected Standard exception from MEX file.
What() is:bad allocation

Respuestas (1)

Edric Ellis
Edric Ellis el 20 de Jun. de 2022
The number of workers in a local parallel pool decreases like this only when one of the worker processes terminates (i.e. crashes with a segmentation fault or similar).
parfor will try to continue even after workers crash, by running the loop iterations on remaining workers. You should get some indication that this is happening in the command window.
You could check for crash dump files in the directory returned by this command:
c = parcluster("local");
c.JobStorageLocation
There will be a bunch of "Job##" directories there - look for the most recent, that will probably the one corresponding to your currently-running parallel pool.
  2 comentarios
한범
한범 el 21 de Jun. de 2022
I found the "Job##" directories and looked into the recent one. But there seem no useful information.
Some '*.log' files are in the directory but ther are just 0-byte empty file.
Also I found 'Task#.common/in/out/state.mat' but they are also without crash info.
Edric Ellis
Edric Ellis el 21 de Jun. de 2022
You can cause workers to emit diagnostic logging information by running
setenv('MDCE_DEBUG', 'true')
before creating the pool. I'm not sure if it will help though...

Iniciar sesión para comentar.

Categorías

Más información sobre MATLAB Parallel Server en Help Center y File Exchange.

Etiquetas

Productos


Versión

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by