Odd Profiler GPU Bug?

6 visualizaciones (últimos 30 días)
Christopher Kanan
Christopher Kanan el 2 de Nov. de 2015
Respondida: Joss Knight el 27 de Nov. de 2015
I have a loop that does some operations on the GPU. When I call the function from the command line, each epoch runs in 96 seconds. When I call it from profiler, each epoch takes about 4 seconds. The CPU version of the code runs in about 63 seconds. Any ideas on how to fix this so that I get the GPU speed up without profiler? I could call it from profiler exclusively, but that seems silly. Here is the portion of code that seems to be affected by this phenomenon. The r array is large, e.g., 4000x78000.
tic
r = gpuArray(single(r));
U_grad = gpuArray(single(0));
W = gpuArray(single(W'));
for k = 1:max_class
U_grad = U_grad + bsxfun(@times, W(:, k), r);
end
U_grad = double(gather(U_grad)) * (q');
toc
  2 comentarios
Edric Ellis
Edric Ellis el 2 de Nov. de 2015
Any chance you could post a standalone reproduction? What version of MATLAB are you using? What OS? What GPU?
Christopher Kanan
Christopher Kanan el 2 de Nov. de 2015
I found a bug that obviated the need for this loop, so I don't really need my question answered anymore. It looks like the GPU was failing to allocate GPU memory when run from the commandline, but was able to allocate the memory when the script was called from profiler. When I put in a break point and inspected the variables, they had a warning about the GPU memory.
But, it still seems strange that it was working from profiler and not the command line.
My setup: NVIDIA Titan (original, not X, etc.), Windows 10, MATLAB 2015a

Iniciar sesión para comentar.

Respuestas (1)

Joss Knight
Joss Knight el 27 de Nov. de 2015
When you have the profiler on (from MATLAB R2015a onwards), to get realistic timings the GPU is forced to run synchronously.
My guess is that without the profiler, your second call to bsxfun is executed before the first finishes (this is possible because it can compute it without needing the output from the previous loop). MATLAB attempts to allocate space for an output array of appropriate size, but there isn't enough space available for two arrays of that size.
With the profiler on, the space from the first call to bsxfun has been freed up and made available to the second bsxfun.
You can confirm this is the case by calling wait(gpuDevice) inside your loop to prevent this execution overlap.

Categorías

Más información sobre Get Started with GPU Coder en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by