Memory management when using GPU in parfor-loop?

5 visualizaciones (últimos 30 días)
Matthias
Matthias el 27 de Oct. de 2014
Respondida: Matthias el 28 de Oct. de 2014
Hi,
I have a machine with four CPU-cores and one GPU. In my code, there's a parfor loop which performs computations on GPUarrays, i.e. four CPU-workers are sharing one GPU. I've already determined that this is faster than performing all computations on the CPU, or using the GPU without the parfor loop.
My problem is that the GPU memory fills up inexplicably when using parfor. There's no problem when using regular "for".
Schematically, my code looks like this:
gpu = gpuDevice;
bigGpuArray = gpuArray(bigArray);
% Size of bigArray is something like 1000 x 1000 x 1000.
n = size(bigArray, 3);
gpuMemBeforeLoop = gpu.AvailableMemory/gpu.TotalMemory;
for i = 1:n
bigGpuArray(:,:,i) = subFunction(bigGpuArray(:,:,i));
% subFunction is a function that is faster when run on the GPU.
gpuMemDuringLoop = gpu.AvailableMemory/gpu.TotalMemory;
disp(gpuMemDuringLoop);
end
newBigArray = gather(bigGpuArray);
gpuMemAfterLoop = gpu.AvailableMemory/gpu.TotalMemory;
When using a normal for loop, I can see from gpuMemBeforeLoop, gpuMemDuringLoop and gpuMemAfterLoop that the memory usage stays constant during the loop, as expected.
However, if I replace the "for" by "parfor", then the memory usage increases linearly with the number of loop iterations and stays high until I call
pctRunOnAll reset(gpuDevice);
I'm surprised by this because I thought that the parallel workers could share the GPU memory smartly: I hoped that each worker would only have to receive a handle/pointer to the data that's already on the GPU. Instead, it looks as if each worker creates a separate copy of the GPU data. Even worse, the parallel workers seem to forget to delete these copies after the parfor loop (gpuMemAfterLoop in my example above shows higher memory use in the parfor than in the for case).
Is this behavior expected? Can I change my approach to avoid this memory leak?
Thanks, Matthias

Respuestas (2)

Matthias
Matthias el 28 de Oct. de 2014
I have now partially solved the problem by using spmd instead of parfor. This way, I can manually slice my variables. However, the memory still isn't cleared after the work is done.

Matt J
Matt J el 27 de Oct. de 2014
Editada: Matt J el 27 de Oct. de 2014
Instead, it looks as if each worker creates a separate copy of the GPU data.
This part might not be so surprising. Parfor always makes duplicates of data needed by the workers, including data pointed to by handle objects. The only exceptions are sliced variables. I guess this is evidence that gpuArrays can't be sliced, but instead behave like handles.
Even worse, the parallel workers seem to forget to delete these copies after the parfor loop (gpuMemAfterLoop in my example above shows higher memory use in the parfor than in the for case).
That does seem like a bug.

Categorías

Más información sobre Parallel for-Loops (parfor) en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by