gpuDevice command very slow
Mostrar comentarios más antiguos
I am running CUDA kernels using the parallel computing toolbox and r2012a. Recently upgraded to a 600 series (Kepler) gpu. To setup the CUDA kernel we extract the maximum threads per block using: gpu_han=gpuDevice(1); k = parallel.gpu.CUDAKernel('gpu_tfm_linear_arb.ptx', gpu_tfm_linear_arb.cu'); k.ThreadBlockSize = gpu_han.MaxThreadsPerBlock;
This is now executing very slowly (order 2mins). If I specify the threadblocksize manually to the max of the card (1024 in this case), it executes in 0.1 s.
This used to run quickly with a 400 series card. Any help gratefully received
Respuesta aceptada
Más respuestas (2)
Andrei Pokrovsky
el 15 de Sept. de 2016
Editada: Andrei Pokrovsky
el 15 de Sept. de 2016
3 votos
Try setting these env vars:
export CUDA_CACHE_MAXSIZE=2147483647
export CUDA_CACHE_DISABLE=0
This cured the problem on my GTX1080.
https://devblogs.nvidia.com/parallelforall/cuda-pro-tip-understand-fat-binaries-jit-caching/
Anthony
el 17 de Jun. de 2013
0 votos
2 comentarios
Edric Ellis
el 18 de Jun. de 2013
The cache is not stored where the program lives, this page from NVIDIA has all the gory details, including this:
- on Windows, %APPDATA%\NVIDIA\ComputeCache,
- on MacOS, $HOME/Library/Application\ Support/NVIDIA/ComputeCache,
- on Linux, ~/.nv/ComputeCache
Anthony
el 12 de Jul. de 2013
Categorías
Más información sobre GPU Computing en Centro de ayuda y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!