Optimize GPU code with nested pagemtimes

3 visualizaciones (últimos 30 días)
tiwwexx
tiwwexx el 28 de Jul. de 2022
Comentada: tiwwexx el 29 de Jul. de 2022
Hello all,
I'm trying to speed up computation using the GPUs that are available to me. Right now I have two arrays, Q and W.
size(W) = (16 1 1000)
size(Q) = (16 16 1 2000)
I want to do a sudo-matrix multiplication M = W ' *Q*W to get size(M) = (1000 2000).
To do this I use two instances of pagemtimes which is able to utilize GPU. Here's the code
%%
tic
Sar_pm_gpu = zeros(num_psar_kept,2,size(shim_pm_gpu,3),'single','gpuArray');
for n =1:size(W,3)
inter_calc = pagemtimes(Q_gpu,shim_pm_gpu(:,1,n));
Sar_this_shim = squeeze(pagemtimes(shim_pm_gpu_left(:,:,n),inter_calc)); %in a test, this one is ~15% faster
[Sar_maxk, index_maxk] = max(Sar_this_shim);
Sar_pm_gpu(:,:,n)=[Sar_maxk,index_maxk];
end
With this code I get ~5x speedup vs running it on the cpu. However I'd expect it to be quite a bit faster than that. I then used nvidia-smi and the power consumption on the GPU was ~35W. For referance the resting power consumption is 30W so I don't think that this code is actually utilizing the GPU. If anyone sees a way to speed this up it would be much appriciated! (a explaination on why the GPU power consumption is so low with this posted code would also be much appriciated, I assume it has something to do with memory)
  2 comentarios
Matt J
Matt J el 28 de Jul. de 2022
You shouldn't be using tic/toc for timing gpuArray operations,
tiwwexx
tiwwexx el 29 de Jul. de 2022
I clipped off the end of the code on accident, I make sure to
gather(output)
before calling toc so it's accurate.

Iniciar sesión para comentar.

Respuesta aceptada

Matt J
Matt J el 28 de Jul. de 2022
I don't think you need either a loop or a second pagemtimes call.
Wr=reshape(W,16,1000);
Qr=reshape(Q,16,16,2000);
M=sum(pagemtimes(Qr,Wr).*Wr,1);
M=reshape(M,1000,2000);
  4 comentarios
Matt J
Matt J el 28 de Jul. de 2022
Editada: Matt J el 28 de Jul. de 2022
It seems to be slower only on the GPU. pagemtimes isn't well-optimized for the GPU, it would appear.
tiwwexx
tiwwexx el 28 de Jul. de 2022
Hmm, very interesting indeed. I have a feeling that I'm eventually going to need to learn CUDA since I run into these problems quite often...

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Kernel Creation from MATLAB Code en Help Center y File Exchange.

Productos


Versión

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by