Unable to compile cuda code containing dynamic parallelism: Error: “ptxas fatal : Unresolved extern function 'cublasCreate_v2'”
14 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Petter Stefansson
el 7 de Jun. de 2017
Comentada: Petter Stefansson
el 16 de Jun. de 2017
I’m trying to create a simple mex function that calls cublas functions such as cublasDgemm from inside a kernel so I can utilize nested, or dynamic, parallelism in my calculations which is supposed to be supported on newer GPUs such as the GTX1080 I’m using.
However, when I try to compile my cuda code from Matlab like this:
mexcuda CUBLAS_dgemm.cu -lcublas
I get the error:
Building with 'NVIDIA CUDA Compiler'.
Error using mex
ptxas fatal : Unresolved extern function 'cublasCreate_v2'
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are
deprecated, and may be removed in a future release (Use
-Wno-deprecated-gpu-targets to suppress warning).
CUBLAS_dgemm.cu
And as soon as I comment out everything inside my kernel which is related to cublas it works fine again...Could someone please advise me on what I need to do to get this to compile and work? I would really appreciate it.
The sample cuda code I’ve written to test this looks like this:
#include "mex.h"
#include "cublas_v2.h"
#include <cuda_runtime.h>
/* Kernel code with dgemm */
__global__ void dgemmkernel(const double* deviceX, double* XX, const int n, const int m) {
/* Cublas handle */
cublasHandle_t handle;
cublasCreate(&handle);
/* Scalar constants */
double alpha = 1.0, beta = 0.0;
/* Calculate XX = X'*X using cublasDgemv. */
cublasDgemm(handle, CUBLAS_OP_T, CUBLAS_OP_N, n, n, m, &alpha, deviceX, m, deviceX, m, &beta, XX, n);
}
/* The Matlab gateway function */
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {
/* Host-side variables */
const double *X; // Host-side input X.
double *Output1; // Matlab output.
size_t m, n; // size variables.
/* Device-side variables. */
double *deviceX; // Device-side version X.
double *XX; // GPU version XX.
/* Get pointers to input host-side array X from Matlab */
X = mxGetPr(prhs[0]);
/* Get the dimensions of the input variables */
m = mxGetM(prhs[0]); // Number of rows in X.
n = mxGetN(prhs[0]); // Number of columns in X.
/* Allocate memory on the device for the variables involved in the calculations. */
cudaMalloc(&deviceX, m * n * sizeof(double)); // [m-by-n]
cudaMalloc(&XX, n * n * sizeof(double)); // [n-by-n]
/* Use cudaMemcpy to copy X from host to device */
cudaMemcpy(deviceX, X, (m*n) * sizeof(double), cudaMemcpyHostToDevice);
/* Call dgemm kernel */
dgemmkernel<<<1, 1>>>(deviceX, XX, n, m);
/* Deliver results back to matlab as host-side variables */
plhs[0] = mxCreateDoubleMatrix(n, n, mxREAL);
Output1 = mxGetPr(plhs[0]);
cudaMemcpy(Output1, XX, (n*n) * sizeof(double), cudaMemcpyDeviceToHost);
/* Free the cudaMalloc'ed arrays from the device before exit */
cudaFree(deviceX);
cudaFree(XX);
}
2 comentarios
Respuesta aceptada
Joss Knight
el 12 de Jun. de 2017
You need to link against the cublas device library in the device linking stage and unfortunately there isn't a proper formal API to do this. You can use the variable NVCC_FLAGS to add it there, and then the standard -L and -l options to add it to the host linking stage. In my example command below the cublas device library is located at /usr/local/cuda/lib64 - you should substitute this for the lib64 directory wherever you've installed the CUDA Toolkit.
mexcuda -v CUBLAS_dgemm.cu -dynamic NVCC_FLAGS=-lcublas_device -L/usr/local/cuda/lib64 -lcublas_device
In the long term I'll take this as a request to have a more convenient way of linking in other device libraries.
Más respuestas (1)
Joss Knight
el 7 de Jun. de 2017
I don't think you can call cublasDgemm inside a kernel - that's not the way dynamic parallelism works. You can't just call any old host code that happens to contain kernel launches. You can, as always, call device code, and what dynamic parallelism lets you do is launch other kernels.
2 comentarios
Joss Knight
el 12 de Jun. de 2017
I did not realise that! Thank you! The MEXCUDA dynamic MEX options file is not properly set up to link against the cublas device library so you'll need to co-opt some existing MEX variables to get the right command line options. See my other Answer for how to do this.
Ver también
Categorías
Más información sobre Get Started with GPU Coder en Help Center y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!