Unable to compile cuda code containing dynamic parallelism: Error: “ptxas fatal : Unresolved extern function 'cublasCreate_v2'”

Question

Petter Stefansson el 7 de Jun. de 2017

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/343696-unable-to-compile-cuda-code-containing-dynamic-parallelism-error-ptxas-fatal-unresolved-extern

Comentada: Petter Stefansson el 16 de Jun. de 2017

I’m trying to create a simple mex function that calls cublas functions such as cublasDgemm from inside a kernel so I can utilize nested, or dynamic, parallelism in my calculations which is supposed to be supported on newer GPUs such as the GTX1080 I’m using.

However, when I try to compile my cuda code from Matlab like this:

mexcuda CUBLAS_dgemm.cu -lcublas

I get the error:

Building with 'NVIDIA CUDA Compiler'.
Error using mex
ptxas fatal   : Unresolved extern function 'cublasCreate_v2'
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are
deprecated, and may be removed in a future release (Use
-Wno-deprecated-gpu-targets to suppress warning).
CUBLAS_dgemm.cu

And as soon as I comment out everything inside my kernel which is related to cublas it works fine again...Could someone please advise me on what I need to do to get this to compile and work? I would really appreciate it.

The sample cuda code I’ve written to test this looks like this:

#include "mex.h" 
#include "cublas_v2.h"
#include <cuda_runtime.h>
/* Kernel code with dgemm */
__global__ void dgemmkernel(const double* deviceX, double* XX, const int n, const int m) {  
    /* Cublas handle */
    cublasHandle_t handle;
    cublasCreate(&handle);
    /* Scalar constants */
    double alpha = 1.0, beta = 0.0;
    /*  Calculate XX = X'*X  using cublasDgemv.                              */
    cublasDgemm(handle, CUBLAS_OP_T, CUBLAS_OP_N, n, n, m, &alpha, deviceX, m, deviceX, m, &beta, XX, n); 
}
/* The Matlab gateway function */
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {
  /* Host-side variables */
  const double *X; // Host-side input X.
  double *Output1; // Matlab output.
  size_t m, n;     // size variables.
  /* Device-side variables. */
  double *deviceX; // Device-side version X.
  double *XX;   // GPU version XX.
  /* Get pointers to input host-side array X from Matlab */
  X = mxGetPr(prhs[0]);
  
  /* Get the dimensions of the input variables */
  m = mxGetM(prhs[0]);   // Number of rows in X.
  n = mxGetN(prhs[0]);   // Number of columns in X.
  /* Allocate memory on the device for the variables involved in the calculations. */
  cudaMalloc(&deviceX, m * n * sizeof(double)); // [m-by-n]
  cudaMalloc(&XX, n * n * sizeof(double));      // [n-by-n]
  
  /* Use cudaMemcpy to copy X from host to device */
  cudaMemcpy(deviceX, X, (m*n) * sizeof(double), cudaMemcpyHostToDevice);
  /* Call dgemm kernel */
  dgemmkernel<<<1, 1>>>(deviceX, XX, n, m);
  /* Deliver results back to matlab as host-side variables */            
  plhs[0] = mxCreateDoubleMatrix(n, n, mxREAL);
  Output1 = mxGetPr(plhs[0]);
  cudaMemcpy(Output1, XX, (n*n) * sizeof(double), cudaMemcpyDeviceToHost);
  /* Free the cudaMalloc'ed arrays from the device before exit */
  cudaFree(deviceX);
  cudaFree(XX);
}

2 comentarios
Mostrar NingunoOcultar Ninguno

Joss Knight el 7 de Jun. de 2017

Did you try using the -dynamic flag to mexcuda?

Petter Stefansson el 7 de Jun. de 2017

Abrir en MATLAB Online

No I hadn’t tried that, but now I have. It still doesn’t work but it changed the error message somewhat into:

Error using mex
nvlink error   : Undefined reference to 'cublasCreate_v2' in
'C:/Users/Petteri7/AppData/Local/Temp/mex_150043189848302_4708/CUBLAS_dgemm.obj'
(target: sm_35)
nvlink error   : Undefined reference to 'cublasDgemm_v2' in
'C:/Users/Petteri7/AppData/Local/Temp/mex_150043189848302_4708/CUBLAS_dgemm.obj'
(target: sm_35)
CUBLAS_dgemm.obj
c_mexapi_version.obj

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Joss Knight el 12 de Jun. de 2017

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/343696-unable-to-compile-cuda-code-containing-dynamic-parallelism-error-ptxas-fatal-unresolved-extern#answer_270427

Abrir en MATLAB Online

You need to link against the cublas device library in the device linking stage and unfortunately there isn't a proper formal API to do this. You can use the variable NVCC_FLAGS to add it there, and then the standard -L and -l options to add it to the host linking stage. In my example command below the cublas device library is located at /usr/local/cuda/lib64 - you should substitute this for the lib64 directory wherever you've installed the CUDA Toolkit.

mexcuda -v CUBLAS_dgemm.cu -dynamic NVCC_FLAGS=-lcublas_device -L/usr/local/cuda/lib64 -lcublas_device

In the long term I'll take this as a request to have a more convenient way of linking in other device libraries.

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Petter Stefansson el 16 de Jun. de 2017

Yes that finally worked! Amazing. Thank you for the help!

Iniciar sesión para comentar.

Answer 2

Joss Knight el 7 de Jun. de 2017

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/343696-unable-to-compile-cuda-code-containing-dynamic-parallelism-error-ptxas-fatal-unresolved-extern#answer_269922

I don't think you can call cublasDgemm inside a kernel - that's not the way dynamic parallelism works. You can't just call any old host code that happens to contain kernel launches. You can, as always, call device code, and what dynamic parallelism lets you do is launch other kernels.

2 comentarios
Mostrar NingunoOcultar Ninguno

Petter Stefansson el 7 de Jun. de 2017

Are you sure that’s not how dynamic parallelism works? Because if you look at nvidia’s “Dynamic parallelism in CUDA” technical notes it specifically states for example:

'A kernel can also call GPU libraries such as CUBLAS directly without needing to return to the CPU..'

As I understood it, starting from toolkit 5.0 any device with at least compute capability 3.5 should be able to do this. This feature is the main reason I upgraded from a lower CC version GPU, so it would be extremely disappointing if I have misunderstood the situation.

Joss Knight el 12 de Jun. de 2017

I did not realise that! Thank you! The MEXCUDA dynamic MEX options file is not properly set up to link against the cublas device library so you'll need to co-opt some existing MEX variables to get the right command line options. See my other Answer for how to do this.

Iniciar sesión para comentar.

Unable to compile cuda code containing dynamic parallelism: Error: “ptxas fatal : Unresolved extern function 'cublasCreate_v2'”

2 comentarios
Mostrar NingunoOcultar Ninguno

Respuesta aceptada

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (1)

2 comentarios
Mostrar NingunoOcultar Ninguno

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

Unable to compile cuda code containing dynamic parallelism: Error: “ptxas fatal : Unresolved extern function 'cublasCreate_v2'”

2 comentarios Mostrar NingunoOcultar Ninguno

Respuesta aceptada

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (1)

2 comentarios Mostrar NingunoOcultar Ninguno

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

2 comentarios
Mostrar NingunoOcultar Ninguno

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

2 comentarios
Mostrar NingunoOcultar Ninguno