Why are pool workers going inactive while many iterations remain?

10 visualizaciones (últimos 30 días)
I have a task wrapped in a parfor loop where text data files are read then converted to numeric values and downsampled to common signal frequencies and saved out as .mat files. There are many thousands of these text files of varying size between ~100 kilobytes and 3 gigabytes. This process is being run on a workstation with a i9-10980XE (18 cores/36 threads 128GB RAM) using the default cluster settings. Upon initial launch of the script, I can see via the resource monitor that the processes for all 18 workers are consuming 100% CPU for each of their threads.
If I check back on the process several hours later, the majority of the workers have stopped contributing with somewhere between 1-4 still running at 100%. All other workers still exist but are dormant as confirmed both by continual 0% CPU usage and by a reduction in the number of simultaneously changing files as compared to the number seen at launch.
At this point there will still be hundreds if not thousands of files left to process, so I am confused as to why these available workers are not being utilized. I can see no evidence of any errors that would have somehow forced the impacted workers to go dormant. I also have not seen any indication that hardware resources became a limiting factor. If a worker was to somehow stop mid-task I would also see partial log files created and not completed, but this is not happening. It appears that most workers are completing a small percentage of the overall task then going dormant while a smaller subset of the pool does the vast majority of the work.
I have seen mention of manually setting RangePartitionMethod and SubrangeSize pool parameters as a possible solution in other questions, but in those situations the issue sounds to be a result of relatively few expected iterations per worker and inconsistent work per iteration. In my situation given the considerably larger number of iterations compared to pool size, I am assuming that the number of files and distribution of file sizes is relatively consistent between workers.
UPDATE:
Based on Jeff and Walters answer I've modifed my script and it now utilizes all workers for the entire parfor execution.
I wrote a function that sets the work partitions such that each worker gets an inital batch which when combined across all workers would be 50% of the total work. The remaining 50% of the work is passed to workers as single items once they become available. This obviously would create some overhead vs pre-assigning, but in my application each execution on a worker is being passed only a file name and is otherwise quite isolated concerning IO. At least in my case if there is a negative impact to the overhead, it is more than made up for by all of the workers being kept at close to 100% utilization.
opts = poolOpts(n);
parfor(i = 1:n,opts)
% do stuff
end
function opts = poolOpts(iterations)
pool = gcp('nocreate'); %pool handle
if isempty(pool)
pool = parpool;
end
nw = pool.NumWorkers; %number of workers in pool
initChunk = floor(iterations/2); %number of iterations to assign at start = 50%
initWorkerChunk = floor(initChunk/nw); %number of iterations per worker to assign at start
poolPartitions = [repmat(initWorkerChunk,1,nw) ones(1,(iterations-(initWorkerChunk*nw)))]; %vector of
opts = parforOptions(gcp,"RangePartitionMethod",@(iterations,nw) poolPartitions);
end
  1 comentario
Matt J
Matt J el 28 de Jul. de 2023
How many files are being processed in total and how many have been fully processed when you check back?

Iniciar sesión para comentar.

Respuesta aceptada

Jeff Miller
Jeff Miller el 28 de Jul. de 2023
What you would like is for any free processor to take up any waiting task (me too), but for some reason that's not how it works. Instead, parfor assigns all of the iterations to the different processors at the start, essentially making a little task queue for each processor. If too many slow tasks go into the queue for one processor (i.e., too many big files, in your case), that processor may still be chugging its way through its queue (with lots of unstarted tasks still in its queue) long after all of the other processors are finished and doing nothing. I think you have to use RangePartitionMethod to allocate tasks (files) more equally across processors.
Maybe there is something helpful in this question
  6 comentarios
Walter Roberson
Walter Roberson el 31 de Jul. de 2023
I no longer recall exactly what happens when the number of parfor iterations matches the number of cores.
... but I would suggest that at that point it might make more logical sense to parfeval() instead of parfor.
Jeff Miller
Jeff Miller el 2 de Ag. de 2023
@Daniel Bengtson Thanks for the update--that looks like a very useful example.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Parallel Computing Fundamentals en Help Center y File Exchange.

Productos


Versión

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by