MATLAB Answers

Odd Profiler GPU Bug?

1 view (last 30 days)
Christopher Kanan
Christopher Kanan on 2 Nov 2015
Answered: Joss Knight on 27 Nov 2015
I have a loop that does some operations on the GPU. When I call the function from the command line, each epoch runs in 96 seconds. When I call it from profiler, each epoch takes about 4 seconds. The CPU version of the code runs in about 63 seconds. Any ideas on how to fix this so that I get the GPU speed up without profiler? I could call it from profiler exclusively, but that seems silly. Here is the portion of code that seems to be affected by this phenomenon. The r array is large, e.g., 4000x78000.
tic
r = gpuArray(single(r));
U_grad = gpuArray(single(0));
W = gpuArray(single(W'));
for k = 1:max_class
U_grad = U_grad + bsxfun(@times, W(:, k), r);
end
U_grad = double(gather(U_grad)) * (q');
toc

  2 Comments

Edric Ellis
Edric Ellis on 2 Nov 2015
Any chance you could post a standalone reproduction? What version of MATLAB are you using? What OS? What GPU?
Christopher Kanan
Christopher Kanan on 2 Nov 2015
I found a bug that obviated the need for this loop, so I don't really need my question answered anymore. It looks like the GPU was failing to allocate GPU memory when run from the commandline, but was able to allocate the memory when the script was called from profiler. When I put in a break point and inspected the variables, they had a warning about the GPU memory.
But, it still seems strange that it was working from profiler and not the command line.
My setup: NVIDIA Titan (original, not X, etc.), Windows 10, MATLAB 2015a

Sign in to comment.

Answers (1)

Joss Knight
Joss Knight on 27 Nov 2015
When you have the profiler on (from MATLAB R2015a onwards), to get realistic timings the GPU is forced to run synchronously.
My guess is that without the profiler, your second call to bsxfun is executed before the first finishes (this is possible because it can compute it without needing the output from the previous loop). MATLAB attempts to allocate space for an output array of appropriate size, but there isn't enough space available for two arrays of that size.
With the profiler on, the space from the first call to bsxfun has been freed up and made available to the second bsxfun.
You can confirm this is the case by calling wait(gpuDevice) inside your loop to prevent this execution overlap.

  0 Comments

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by