gpuArray large sparse arrays. Error codes: "CUSPARSE_​INTERNAL_E​RROR" / "UNKNOWN_ERROR"

2 visualizaciones (últimos 30 días)
I have 2 gpus: the first is a NVIDIA GeForce RTX 3090 Ti and the second is a NVIDIA GeForce RTX 2060 SUPER. I am running on a linux machine with NVIDIA Driver Version: 515.105.01 and CUDA Version: 11.7. I am using Matlab 2022b (update 7). When I create a large sparse gpuArray with the second gpu (smaller) there is no problem, when I repeat using the first gpu (larger) I get error code: UNKNOWN_ERROR or sometimes the error code is CUSPARSE_INTERNAL_ERROR
sample code:
%% Device 2 - small sparse array - no problem
gpuDevice(2)
a = speye(100000,100000);
a = gpuArray(a);
%% Device 2 - large sparse array - no problem
gpuDevice(2)
a = speye(10000000,10000000);
a = gpuArray(a);
%% Device 1 - small sparse array - no problem
gpuDevice(1)
a = speye(100000,100000);
a = gpuArray(a);
%% Device 1 - large sparse array - problem!!
gpuDevice(1)
a = speye(10000000,10000000);
a = gpuArray(a);
Error:
Error using gpuArray
An unexpected error occurred on the device. The error code was: UNKNOWN_ERROR.
Error in gpuTest (line 19)
a = gpuArray(a);
Device 2 is smaller (8 GB) so shouldn't be able to handle larger arrays. Here is it's details:
Name: 'NVIDIA GeForce RTX 2060 SUPER'
Index: 2
ComputeCapability: '7.5'
SupportsDouble: 1
DriverVersion: 11.7000
ToolkitVersion: 11.2000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152 (49.15 KB)
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 8370192384 (8.37 GB)
AvailableMemory: 7891910656 (7.89 GB)
MultiprocessorCount: 34
ClockRateKHz: 1680000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
and here are the details of gpu that crashes (the larger one)
Name: 'NVIDIA GeForce RTX 3090 Ti'
Index: 1
ComputeCapability: '8.6'
SupportsDouble: 1
DriverVersion: 11.7000
ToolkitVersion: 11.2000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152 (49.15 KB)
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 25431965696 (25.43 GB)
AvailableMemory: 24284102656 (24.28 GB)
MultiprocessorCount: 84
ClockRateKHz: 1950000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
Any help or explanation would be much appreciated.

Respuestas (2)

Ayush
Ayush el 26 de Dic. de 2023
Editada: Ayush el 26 de Dic. de 2023
I understand that you are getting the errors when you create a large sparse gpuArray with the first GPU, having higher specifications, and not getting any errors when using second gpu, with lower specifications. I tired to reproduce the issue with my 2 GPUs:
  1. NVIDIA RTX 6000 Ada Generation | Total Memory: 51GB
  2. NVIDIA GeForce RTX 2080 SUPER | Total Memory: 8GB
The error was reproducible only untill R2022b. From R2023a MATLAB release onwards the error has been fixed.
Thanks
Ayush Jaiswal

Joss Knight
Joss Knight el 3 de En. de 2024
Hi Joseph. It's hard to be definitive. There were some problems with cusparse and also Windows drivers when supporting the newest GeForce Ampere cards with CUDA 11.2, but I believed this to be fixed in R2023b and CUDA 11.8. Can you raise a support ticket and follow this up with MathWorks Support? Thanks.

Categorías

Más información sobre GPU Computing en Help Center y File Exchange.

Productos


Versión

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by