Sum of squares profiling on GPU
1 visualización (últimos 30 días)
Mostrar comentarios más antiguos
I was profiling some code that runs on my GPU and came across something rather puzzling that I haven't been able to sort out... maybe it has something to do with the way the profiler interacts with the GPU, so I also tried on the CPU and got very different results. Here is the code:
clear all
g = gpuArray.rand(600, 600, 400, 'single');
for i = 1:100
x = sum(g, 3)/400;
gSq = g.^2;
y = sum(gSq, 3)/400;
g = g+.01;
end
This code is just an example of the problem, not the actual code I am running, so don't try to wonder why anybody would do this...
On the GPU the profiler shows basically ALL of the time is spent on the line
y = sum(gSq, 3)/400;
On the CPU, the profiler shows most of the time being spent on
g = g+.01;
and the remainder of the time is evenly distributed among the other lines.
Why is summing the gSq array so expensive on the GPU relative to summing the x array? They are the same size... I don't think it is a memory issue since my GPU has 4GB memory and almost 3GB is still available with g, x, gSq and y in memory.
Any ideas?
3 comentarios
Respuestas (1)
Ver también
Categorías
Más información sobre GPU Computing en Help Center y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!