- the second "wait(gpu)" inside your tight loop is not needed and will be affecting results. Memory transfers from device to host (i.e. "gather") are always synchronized.
- You are measuring the speed of transferring data to/from the GPU (i.e. the speed of the PCI bus). This is not the same as the GPU memory bandwidth (as suggested by the question title), which is much, much higher (>90GB/sec for your GPU and even higher for a recent GPU).
- it is nearly impossible to accurately measure the transfer bandwidth from within MATLAB. What you are actually timing here is the time taken to allocate some space (on the GPU in the first case, in host memory for the second), to perform the data-transfer and to assign a MATLAB variable. These extra steps take some (hopefully small) amount of time that will reduce the results.
- some of the variability may come from other processes using the PCI bus. Running your OS in a highly stripped-down mode with no network etc. might help.
How to measure GPU memory bandwidth ?
7 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
I have a TeslaC1060 with 4Gb of memory. I am running MatlabR2012b and I am using the following code to measure the memory bandwidth between host and device.
gpu = gpuDevice()
N=8192; data = rand(N,N); %
for k=1:100
tic;
gdata = gpuArray(data); wait(gpu);
CPU2GPU(k) = N^2*8/1024^3/toc;
tic;
data2 = gather(gdata); wait(gpu);
GPU2CPU(k) = N^2*8/1024^3/toc;
end
figure;
plot(1:100,CPU2GPU,'r.',1:100,GPU2CPU,'b.');
legend('CPU->GPU','GPU->CPU');
I found less than 1.5 Gb/s from GPU to CPU and less than 3.0 Gb/s from CPU to GPU (averaging 100 values except the very first ones). 1) Why the values measured are so far from the expected 8 Gb/s? It turns out that the 100 values vary from one run to another by a factor almost 2. 2) Why the behavior of this code is not so reproductible?
Thanks for your help.
0 comentarios
Respuesta aceptada
Ben Tordoff
el 15 de Abr. de 2013
Hi Anterrieu,
you might like to have a look at the following article:
in those results, the achieved transfer bandwidth tops out at about 5.7GB/sec (send) and 4.0GB/sec (gather). Whilst I can't give you a definitive answer as to why your measured transfer rates are so low and unreliable, here are a couple of points to consider:
If you try the code from the article and still see much lower results, let me know. Note, however, that you are not really measuring your GPU here, you are simply measuring how busy your PCI bus is and how well MATLAB can throw data at it. It's an important measure, but it's not usually the most important one, so long as you do plenty of calculations with your data once you've put it on the GPU. If you want to know more about your GPU's calculation performance, you might like to take GPUBench for a spin:
Ben
Más respuestas (0)
Ver también
Categorías
Más información sobre Parallel and Cloud en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!