CUDA kernel MaxThreadsPerBlock not constant

Question

Martin Strambach el 30 de En. de 2020

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/502867-cuda-kernel-maxthreadsperblock-not-constant

Respondida: Edric Ellis el 3 de Feb. de 2020

I create a CUDA kernel using KERN = parallel.gpu.CUDAKernel(PTXFILE,CUFILE,FUNC). Block size is computed from KERN.MaxThreadsPerBlock which may vary based on a function which is used to build the kernel. I presumed MaxThreadsPerBlock is only dependent on gpuDevice properties. So far, it seems there might be some connection to number of function parameters. Can someone explain how this is actually determined or am I missing something?

I'm using Matlab 2019b, GCC 8.3, CUDA Toolkit 10.1 with NVidia V100 (CC 7.0).

2 comentarios
Mostrar NingunoOcultar Ninguno

Joss Knight el 2 de Feb. de 2020

I can't work out how you'd see this for the same device. Can you post some reproduction code?

Martin Strambach el 2 de Feb. de 2020

example_code.zip

Hi Joss, thanks for your reply!

You can find an example in the attachement. It's not exactly minimal working example, but it should do. The file computeITT contains two entry points - one for single precision and the other one for double precision. I've also attached compiled ptx code (GCC 8.3, CUDA toolkit 10.1.243) compiled as follows: nvcc -ptx --gpu-architecture=compute_70. The rest of the files are just includes. When you construct a kernel with single precision entry point then KERN.MaxThreadsPerBlock is 1024, when you do the same thing for double precision entry point then KERN.MaxThreadsPerBlock is 512.

As odd as it sounds, the MaxThreadsPerBlock parameter isn't always half for double precision compared to single precision.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Edric Ellis el 3 de Feb. de 2020

1
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/502867-cuda-kernel-maxthreadsperblock-not-constant#answer_413458

In your comment you mention that you see different values of MaxThreadsPerBlock for different kernels. This is expected. The CUDAKernel object builds on the underlying CUDA Driver API. Different kernel functions have different requirements in terms of shared memory, registers, and other resources, and this affects how many threads per block can be launched. This is described (briefly) in the CUDA Driver reference documentation here: https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1g5e92a1b0d8d1b82cb00dcfb2de15961b (In case that link goes stale - it describes the function cuFuncGetAttribute which allows you to query the CUDA attribute CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK).

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

CUDA kernel MaxThreadsPerBlock not constant

2 comentarios
Mostrar NingunoOcultar Ninguno

Respuesta aceptada

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

CUDA kernel MaxThreadsPerBlock not constant

2 comentarios Mostrar NingunoOcultar Ninguno

Respuesta aceptada

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

2 comentarios
Mostrar NingunoOcultar Ninguno

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos