Fetching outputs from different GPU's, results in an error ?

Question

Srinidhi Ganeshan el 27 de En. de 2019

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/441747-fetching-outputs-from-different-gpu-s-results-in-an-error

Editada: Joss Knight el 30 de En. de 2019

I have 2-GPU in my computer, I wanted to use both the GPU's to perform the function. Hence I feed, part of the array to one GPU and the remaining to the second GPU.

Agpu1=gpuArray(A(:,:,1:n/2));    %chunk #1 : send to GPU with device index 1
Agpu2=gpuArray(A(:,:,n/2+1:n));    %chunk #2 : send to GPU with device index 2
F(1)=parfeval(@Function,2,Agpu1,1); 
F(2)=parfeval(@Function,2,Agpu2,2);
[o1,o2] = fetchOutputs(F,'UniformOutput',false);  % Blocks until complete 

When I fetch the outputs using the last statement, I get the error "Error using parallel.Future/fetchOutputs : One or more futures resulted in an error" .

1) Does this mean, fetch outputs is trying to fetch the output, when the other GPU is still performing the operation. How to solve this ?

2)https://www.mathworks.com/matlabcentral/answers/162421-how-to-use-multiple-gpus-asynchronously#comment_663649,

In the above link, when I try printing the gpuDevice used, it always shows gpu2 is being used and gpu 1 is idle. How to confirm both GPU's are being used ?

Thankyou!

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Joss Knight el 28 de En. de 2019

First, what is the error? Display F.Error to find out. Secondly, how many workers in your pool? If there's only one worker, then of course every call to parfeval will use the same GPU.

Thirdly GPUs are dealt out to pool workers in a round-robin fashion, but parfeval gives you no ability to select which one will be used. If you have four pool workers and two GPUs, and you invoke parfeval twice, you might get workers 1 and 3 which will have the same GPU selected.

One solution is to select the device manually in your function using gpuDevice, which will ensure a particular GPU is used. (By the way, I hope your function isn't actually called function because that's a keyword.)

Another would be to open a pool with a single worker, and use the client for the other half of the computation. This would help with data transfer since you don't need to transfer half of the array to another process.

The 'correct' solution (if there really is one) is to use SPMD, since you want both workers to be doing exactly the same thing with different data. As long as you have a pool of 2 workers you will guarantee that both are using different GPUs, and you won't even need to have a separate function. Again, no point in copying the data to the GPU before opening the SPMD block, because that will in fact slow down the data transfer rather than speeding it up.

Srinidhi Ganeshan el 29 de En. de 2019

Abrir en MATLAB Online

Below is the code :

for i=1:500
    A(:,:,i)=rand(500,500); 
end
Agpu1=gpuArray(A(:,:,1:n/2));      %chunk #1 : send to GPU with device index 1
Agpu2=gpuArray(A(:,:,n/2+1:n));    %chunk #2 : send to GPU with device index 2
F(1)=parfeval(@fcn,2,Agpu1,1); 
F(2)=parfeval(@fcn,2,Agpu2,2);
[o1,o2] = fetchOutputs(F,'UniformOutput',false);  % Blocks until complete 
function [q,r]=fcn(A,Id)
if nargin>1, gpuDevice(Id);end
     for i=size(A,3):-1:1  
         [q(:,:,i),r(:,:,i)]=qr(A(:,:,i),0);
     end
end

1) a) Error:

ans =

ParallelException with properties:

identifier: 'parallel:gpu:array:InvalidData'

message: 'The data no longer exists on the device.'

cause: {}

remotecause: {[1x1 MException]}

stack: [1x1 struct]

2) I am using 16 workers. In this case how will parfeval use the GPU

3) In my program, I used different GPU's using the gpuDevice Id. When I do that and execute my program, I get an error in line 5 i.e at fetch outputs. The error message is mentioned above.

4)Thanks, for mentioning that, function is not called a function in my program.

5) How to do "Another would be to open a pool with a single worker, and use the client for the other half of the computation. This would help with data transfer since you don't need to transfer half of the array to another process." ? Is there any small example you could provide ?

6)So inorder to solve (3), I tried using wait, one of the methods of parallel.FevalFuture this way

Agpu1=gpuArray(A(:,:,1:n/2));      %chunk #1 : send to GPU with device index 1
Agpu2=gpuArray(A(:,:,n/2+1:n));    %chunk #2 : send to GPU with device index 2
F(1)=parfeval(@fcn,2,Agpu1,1); 
F(2)=parfeval(@fcn,2,Agpu2,2);
wait(F,'finished');
[o1,o2] = fetchOutputs(F,'UniformOutput',false);  % Blocks until complete 

Still I get the same error.

I also tried using fetchNext so that, each completed job arrives when it is done,,

Q1=cell(1,2);
R1=cell(1,2);
   for idx=1:2
     [completedIdx,Q,R] = fetchNext(F);
     disp(completedIdx);
     Q1{completedIdx}=Q;
     R1{completedIdx}=R;
   end
  toc
  Q=cat(3,gather(Q1{1}),gather(Q1{2}));
  R=cat(3,gather(R1{1}),gather(R1{2}));
  

Eventhough I do this, I get the same error stating

One or more future results resulted in an error.

What should I do to solve this ?

To sum it up , I am planning to do a small part of my QR in CPU and rest of part split between the GPU devices. So that Cpu, gpu1, gpu2.. are kept busy at the same time.

Joss Knight el 30 de En. de 2019

Editada: Joss Knight el 30 de En. de 2019

Abrir en MATLAB Online

You can try to use the same GPUs on more than one parallel worker, but it's pointless - the work will happen in serial. If you have two GPUs, open a pool with two workers. If you want to do some work on the GPU and some on the CPU, take a look at the answer to this question.

The error is a pretty simple one. Every time you select the device using gpuDevice, you are resetting it, clearing all gpuArray variables in memory, including the ones you passed in. As I said, there is no point in moving the data to the GPU on the client MATLAB and then sending it to your worker in a parfeval call. All that happens is that the data gets transferred back to the system memory, then transmitted to the other process, then deserialised and put back on whatever device is currently selected. Create your data on your worker or send it as a CPU array and then transfer it to the GPU at the other end. You could also try using a parallel.pool.Constant to define data on your workers that persists from call to call.

If I was trying to do pagewise QR like you are on two GPUs I'd probably use SPMD, and I probably would limit the GPU work to just the call to qr - there's no advantage to all that indexing and storage on the GPU, I don't think:

parpool('local', gpuDeviceCount);
spmd
    nPages = size(A,3);
    blocksize = ceil(nPages/numlabs);
    strt = (labindex-1)*blocksize + 1;
    fnsh = min(nPages, strt+blocksize);
    
    for j = fnsh:-1:strt
        Agpu = gpuArray(A(:,:,j));
        [qgpu,rgpu] = qr(Agpu, 0);
        i = j-strt+1;
        q(:,:,i) = gather(qgpu);
        r(:,:,i) = gather(rgpu);
    end
end
% q and r are now Composites so need to be indexed to recreate result
Q = cat(3, q{:});
R = cat(3, r{:});

By the way, I hope you're not actually doing this

for i=1:500
    A(:,:,i)=rand(500,500); 
end

Since it's just the same as A = rand(500,500,500), but way slower.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Fetching outputs from different GPU's, results in an error ?

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

Fetching outputs from different GPU's, results in an error ?

3 comentarios Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo