How do I know how large an array can fit on the GPU?

22 visualizaciones (últimos 30 días)
Jae-Hee Park
Jae-Hee Park el 26 de Ag. de 2022
Comentada: Joss Knight el 1 de Sept. de 2022
Hi
I am trying some analysis on gpu like fft() functions.
But the array is too large to calulate on my GPU(TITAN Xp).
So, I thought slicing array and put it on GPU then collecting and reshape after calculating.
But, I don't know what size is fit on my GPU.
Please how can I know the fit array size on my GPU.
thank you.
Jae-Hee Park
  2 comentarios
Jae-Hee Park
Jae-Hee Park el 26 de Ag. de 2022
My gpuDevice return like this. and then What can I do?
Name: 'NVIDIA TITAN Xp'
Index: 1
ComputeCapability: '6.1'
SupportsDouble: 1
DriverVersion: 11.7
ToolkitVersion: 11
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 1.2885e+10
AvailableMemory: 1.1665e+10
MultiprocessorCount: 30
ClockRateKHz: 1582000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1

Iniciar sesión para comentar.

Respuesta aceptada

Mike Croucher
Mike Croucher el 26 de Ag. de 2022
Editada: Mike Croucher el 26 de Ag. de 2022
As you've seen, gpuDevice() gives you information about your GPU. This is what I get for mine
>> gpuDevice()
ans =
CUDADevice with properties:
Name: 'NVIDIA GeForce RTX 3070'
Index: 1
ComputeCapability: '8.6'
SupportsDouble: 1
DriverVersion: 11.6000
ToolkitVersion: 11.2000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 8.5894e+09
AvailableMemory: 7.2955e+09
MultiprocessorCount: 46
ClockRateKHz: 1725000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
The important parameter here is AvailableMemory. I have 7.2955e+09 bytes (you have rather more!). What does this mean in terms of matrix size?
A double precision number is 8 bytes so in theory I can have 7.2955e+09/8 = 911937500 doubles on the card. This is my hard, nothing I can do about it, limit. There simply isn't the capacity on my GPU to have more than that. Consider this an upper bound. In terms of a square matrix its roughly 30,000 x 30,000 since
sqrt(911937500)
ans =
3.0198e+04
Let's transfer a matrix that big to my GPU and see if I'm successful
a = zeros(3.0198e+04);
>> gpuA = gpuArray(a);
>> gpuDevice()
ans =
CUDADevice with properties:
Name: 'NVIDIA GeForce RTX 3070'
Index: 1
ComputeCapability: '8.6'
SupportsDouble: 1
DriverVersion: 11.6000
ToolkitVersion: 11.2000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 8.5894e+09
AvailableMemory: 110592
MultiprocessorCount: 46
ClockRateKHz: 1725000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
Worked! and I had 110592 bytes left over.
However, the useful limit will be rather lower than this. If I stuff my card full of data then there's no room for any GPU algorithm to do any computation. Even adding 1 to all the elements of a GPU array this big is too much. Clearly matrix addition isn't done completely in place.
gpuA = gpuA +1;
Error using +
Out of memory on device. To view more detail about available memory on the GPU,
use 'gpuDevice()'. If the problem persists, reset the GPU by calling
'gpuDevice(1)'.
I can at least do something though. The sum command works, for example, even though the answer isn't very interesting in this case.
>> sum(gpuA,'all')
ans =
0
How much memory you need to do computations depends on the algorithms involved but hopefully you can use this thinking as a starting point for what you can expect to squeeze onto your GPU.
  1 comentario
Joss Knight
Joss Knight el 1 de Sept. de 2022
Just FYI, MATLAB won't allow in-place computation on a workspace variable because it needs to hold onto the original array in case of error (or user Ctrl-C). Computation inside a function on local variables will be more optimized.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Matrix Indexing en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by