Matlab + CUDA slow in solving matrix-vector equation A*x=B

1 visualización (últimos 30 días)
I am calculating an equation A*x=B, where A is a matrix and B is a vector, x is answer (unknown) vector.
Hardware specs: Intel i7 3630QM (4 cores), nVidia GeForce GT 640M (384 CUDA cores)
Here's an example:
>> A=rand(5000);
>> B=rand(5000,1);
>> Agpu=gpuArray(A);
>> Bgpu=gpuArray(B);
>> tic;A\B;toc;
Elapsed time is 1.382281 seconds.
>> tic;Agpu\Bgpu;toc;
Elapsed time is 4.775395 seconds.
Somehow GPU is much slower... Why? It is also slower in FFT, INV, LU calculations, which should be related with matrix division.
However, GPU is much faster in matrix multiplication (with the same data):
>> tic;A*B;toc;
Elapsed time is 0.014700 seconds.
>> tic;Agpu*Bgpu;toc;
Elapsed time is 0.000505 seconds.
The main question is why GPU A\B (mldivide) is so slow comparing to CPU?
  1 comentario
Aurimas
Aurimas el 16 de Feb. de 2013
Here are some more results when A, B (on CPU), AA, BB (on GPU) are rand(5000):
>> tic;fft(A);toc;
Elapsed time is *0.117189 *seconds.
>> tic;fft(AA);toc;
Elapsed time is 1.062969 seconds.
>> tic;fft(AA);toc;
Elapsed time is 0.542242 seconds.
>> tic;fft(AA);toc;
Elapsed time is *0.229773* seconds.
>> tic;fft(AA);toc;
Bold times are stable times. However GPU is almost twice slower. By the way, why GPU is even more slower on first two attempts? Is it compiled twice firstly?
In addition:
>> tic;sin(A);toc;
Elapsed time is *0.121008* seconds.
>> tic;sin(AA);toc;
Elapsed time is 0.020448 seconds.
>> tic;sin(AA);toc;
Elapsed time is 0.157209 seconds.
>> tic;sin(AA);toc;
Elapsed time is *0.000419 *seconds
After two calculations GPU is incredibly faster in sin calculations.
So, still, why GPU is so slow in matrix division, fft and similar calculations, though it is so fast in matrix multiplication and trigonometry? The question actually should not be like that... GPU should be faster in all these calculations because Matlab has released overlapped functions (mldivide, fft) for GPU.
Could somebody help me solve these issues, please? :)

Iniciar sesión para comentar.

Respuesta aceptada

Jill Reese
Jill Reese el 19 de Feb. de 2013
To get accurate timings for GPU calculations you need to be sure to wait for the GPU to finish. You should modify all your timings accordingly:
g = gpuDevice();
tic;
f = fft(A);
wait(g);
toc;
Also, not all GPUs are created equal. To get a sense of what performance you can expect from your GPU in relation to other devices out there, you may want to run gpuBench which can be found here:

Más respuestas (0)

Categorías

Más información sobre GPU Computing en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by