Select a GPU to be used by a function running in parallel(parfeval)

Hello,
I am using two GPUs in a GUI postprocessing images live.
I am using the 'FramesAcquiredFcn' as a hub to transfer data from the buffer to the CPU RAM memory ('getdata') and launch a 'parfeval' function on GPU postprocessing these data. I have 1 pool with 3 workers initialized (one for the 'FramesAcquiredFcn' and one for each GPU). I can check that when the 'parfeval' function is not finished at the next iteration the second GPU is used instead. Now, I know that using 'wait' I can force the first GPU to be used.
However, I would like to split my postprocessing between these two GPUs because one is more powerful than the other. Hence, I would need the first 'parfeval' function to use the GPU1 pass the outputs (using 'afterEach'?) to an other parfeval function using the GPU2.
I have read the 'help selectGPU' but could not figure out how to modify it for my purpose.
Any help will be welcome.

 Respuesta aceptada

Joss Knight
Joss Knight el 18 de Ag. de 2019
I'd have to know what kind of postprocessing you're doing - please post some code. On the face of it, the answer is simply to use gpuDevice(i) to select a particular device for each of your parfeval calls.

7 comentarios

Hi Joss Knight,
The issue I have with gpuDevice(i) is that it takes typically more than 1s to be executed, and I need to run the application live; tic;gpuDevice(2);wait(gpuDevice);toc;tic;gpuDevice(1);wait(gpuDevice);toc; gives me 1.6 and 0.95 s, respectively. Hence, using it would considerably impede what I am doing; as in this amount of time, I would queue much more data and saturate my memory even more rapidly.
I am looking for a way to initialize a GPU to be used by a worker, and use this worker for a parallel.FevalFuture.
Typically, my first postprocessing on GPU consists to fft2 1920*1220*100 (single), cropped out an area giving me 1920/6*1220/6*100 (single) which is ifft2 and divided my a 1920/6*1220/6 (single) preloaded on GPU. Then fft2, cropped to 1920/(6*2)*1220/(6*2)*100 (single), fft along the third dimension, cropped to 1920/(6*2)*1220/(6*2)*50 (single), reshapped, and ordered following the histogram principle.
The second postprocessing consists of few fft on small matrices.
If really necessary, I will provide some code.
The most reliable way is to set your gpuDevice index using SPMD before you run your parfevals. For instance, say you have 6 workers and you want to assign them to devices [1 1 1 2 2 3]:
gpuDeviceIndex = [1 1 1 2 2 3];
spmd
gpuDevice(gpuDeviceIndex(labindex));
end
This will fix the selected devices for each worker as long as this pool is open.
Thanks for your answer Joss Knight.
I am now only missing a way to associate a worker to a parfeval function.
Where can I make this link?
Here is a mock-up code:
%Initialization
gpuDevice([]);
handles.p = gcp('nocreate');
if isempty(handles.p)
handles.p = parpool(4);%4 workers for instance
end
gpuDeviceIndex = [1 1 2 2];
spmd
gpuDevice(gpuDeviceIndex(labindex));
end
for ii=1:2 % with 2 the number of future events to evaluate on GPU 1
GPU1{ii} = parallel.FevalFuture;
end
for ii=1:2 % with 2 the number of future events to evaluate on GPU 2
GPU2{ii} = parallel.FevalFuture;
end
%%Further down
DATA=getgata(Camera);
GPU1{1} = parfeval(@PostProcessGPU1, 1, ...
DATA);%to be forced on GPU1
%%Next batch of data
wait(GPU{1})
GPU2{1} = parfeval(@PostProcessGPU2, 1, ...
OutputofGPU1);%to be force on GPU2
DATA=getgata(Camera);
GPU1{1} = parfeval(@PostProcessGPU1, 1, ...
DATA);%to be forced on GPU1
The whole point of parfeval is to allow the scheduler to choose which worker to run the job on, based on load. If you want to schedule the work yourself then you're probably using the wrong mechanism.
That said, if you want to do different work on different GPUs you can tell which GPU you have selected by looking at the properties of the object returned by gpuDevice.
Tutu
Tutu el 20 de Ag. de 2019
Editada: Tutu el 20 de Ag. de 2019
Could the scheduler choose on which worker from a group having as only restriction the use of one GPU? I don't think it contradicts your statement, then.
You can use a third-party scheduler and give it the properties you like, but I suspect that is not really what you're after. I think the real issue here is that MATLAB has no way to switch devices without resetting the device, which is slow; this is something we intend to improve.
You may be able to use client/worker parallelism here. Do your front-end work on the client on one GPU and dispatch your background work to a single worker that has the other GPU selected (or multiple GPUs all of which have the other GPU selected, if that's appropriate).
Thanks Joss Knight for your answer. It is not what I am after, but I understand as well that MATLAB has not yet implemented this option without reseting the whole device. It is what I needed to know.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Productos

Versión

R2019a

Preguntada:

el 16 de Ag. de 2019

Comentada:

el 21 de Ag. de 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by