Mex & Cuda

Question

jason beckell el 24 de En. de 2012

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/26946-mex-cuda

Editada: Cogli el 10 de Mzo. de 2016

Hello to everybody,

I' a student newbie to Matlab and Cuda. I have to write down a simple Mex file which takes a vector as an input and then calls a routine from a shared library in CUDA(void twotimes(float* x, float *y, int n)) which simply multiplies by two each element of such a vector.

This is the code of the mex file:

#include "mex.h"
#include "matrix.h"
/*Headr file of the shared library */
#include "doppiolgcc.h"
void myExitFcn()
{
  mexPrintf("MEX-file is being unloaded");
}
void mexFunction(int nlhs, mxArray *plhs[], int nrhs,
            const mxArray *prhs[])
{
  double *x, *y;
  int i;
  int mrows, ncols;
    /* The input must be a noncomplex floating-point vector*/
    mrows = mxGetM(prhs[0]);
    ncols = mxGetN(prhs[0]);
    if (!mxIsDouble(prhs[0]) || mxIsComplex(prhs[0]) ||
        !(ncols == 1)) {
      mexErrMsgTxt("Input must be a noncomplex floating-point vector.");
    }
    /* Assign pointers to each input and output. */
    x = mxGetPr(prhs[0]);
    plhs[0] = mxCreateDoubleMatrix(mrows, ncols, mxREAL);
    y = mxGetPr(plhs[0]);
    /*Call the external routine */
    timestwo(x, y, mrows); 
    if(mexAtExit(myExitFcn))
    {
      mexPrintf("Error unloading function!");
    }
}

The code of the header file is the following

extern "C" void timestwo(float *x, float *y, int LEN);

And this is the simple implementation in CUDA of such a procedure

const int N = 256;
__global__ void vecAdd(float* A, float* B)
{
    int i = threadIdx.x + blockDim.x * blockIdx.x;
    B[i] = A[i]*2.0;
}
extern "C" void timestwo(float *x, float *y, int len)
{
  /* pointers to device memory */
  float  *x_d, *y_d;
    /* Allocate arrays x_d, y_d on device*/
    cudaMalloc ((void **) &x_d, sizeof(float)*len);
    cudaMalloc ((void **) &y_d, sizeof(float)*len);
    /* Copy data from host memory to device memory */
    cudaMemcpy(x_d, x, sizeof(float)*len, cudaMemcpyHostToDevice);
    /* Launch the computation*/ 
    vecAdd<<< N/len, len>>>(x_d, y_d);
    /* Copy data from deveice memory to host memory */
    cudaMemcpy(y, y_d, sizeof(float)*len, cudaMemcpyDeviceToHost);
    /* Free the memory */
    cudaFree(x_d); 
    cudaFree(y_d);  
}

After doing so, I compile successfully and then I launch my application. I initalize my variable input like this:

for i=1:256 a(i)=i; a=a' end

This is the the final output:

b = doppiom(a, i)

So, I obtain b = b[i] : b[i] = a[256]*2^i for few i, instead of b[i] = a[i]*2 for all i. How come doesn't it work?

Thank you all very much!

Jason.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Friedrich el 24 de En. de 2012

1
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/26946-mex-cuda#answer_35083

Abrir en MATLAB Online

Hi,

I am not a CUDA expert but as far as I can tell the reason for this behavior is the way you call vecAdd:

vecAdd<<< N/len, len>>>(x_d, y_d);

You start 256/len blocks where each block has len amount of threads. I would rather try something like his

vecAdd<<< 1, len>>>(x_d, y_d);

and in the vecadd do:

int i = threadIdx.x;
B[i] = A[i]*2.0;

Since there is a limit (1024) in the amount of threads per block this won’t work correctly.

So you like to get blocks of size 256 and most likely 256 threads I would try this:

vecAdd<<< len/N, N>>>(x_d, y_d);

In that way you get len/N blocks where each block runs N threads. And in the vecAdd do what you already do:

int i = threadIdx.x + blockDim.x * blockIdx.x;
B[i] = A[i]*2.0;

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Answer 2

jason beckell el 25 de En. de 2012

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/26946-mex-cuda#answer_35172

Thank you very much Friedrich for your suggestion! It's been very kind of you! In any case, the main problem was that the Cuda file expected float variables as inputs, whereas Matlab passed to it only double variables. Thank you very much again and to you all!

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Cogli el 10 de Mzo. de 2016

Editada: Cogli el 10 de Mzo. de 2016

I have encountered the same situation. I used double type in both mex main .cpp and customized .cu file, and my returned result (i.e. plhs) is always 0.

What did you mean by "the main problem was that the Cuda file expected float variables as inputs, whereas Matlab passed to it only double variables"?

Iniciar sesión para comentar.

Mex & Cuda

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (2)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

Mex & Cuda

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (2)

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos