When to use MEX in a MATLAB progam (Example of an Ulam Map)

1 visualización (últimos 30 días)
I am attempting to learn how to write MEX programs to use in my MATLAB programs when there is need. As far as I know, one does this when one encounters a bottleneck that cannot be suitably vectorized - i.e. when one needs to repat a loop a large number of times, and this loop in unavoidable. However, in the program (See below) that I wrote to test this, I am not finding a significant speed up between MEX and Matlab, and am wondering if a) I have done something wrong b) the example I chose wasn't a good one, or c) MEX does not offer that substantial of a speed up. Any advice in this matter would be greatly appreciated.
This is the MEX program I wrote to iterate the Ulam Map.
#include <math.h>
#include "mex.h"
/*A program to iterate the Ulam Map in MEX
*Call from MATLAB using UlamMap(seed, iter_num */
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
#define B_OUT plhs[0]
#define S_IN prhs[0]
#define I_IN prhs[1]
double *B, S;
int I, N, n;
if (nrhs < 1 || nrhs >2)
mexErrMsgTxt("Wrong number of input arguments.");
else if(nlhs > 1)
mexErrMsgTxt("Too many output arguments.");
if(nrhs == 1)
I = 4096;
else
I = mxGetScalar(I_IN);
S = mxGetScalar(S_IN);
B_OUT = mxCreateDoubleMatrix(1, I, mxREAL);
B = mxGetPr(B_OUT);
B[0] = S;
for (n = 1; n<I; n++)
{
B[n] = 1.0-2.0*pow(B[n-1], 2);
}
return;
}
This is the wrapper script which executes this program, as well as a MATLAB version.
format long
x0 = 0.1;
iter =4000096;
tic
vals = zeros(1, iter);
vals(1) = 0.1;
for i = 2:iter
vals(i) = 1-2*vals(i-1)^2;
end
toc
tic
b = UlamMap(x0, iter);
toc
sum((vals-b).^2)
With this large number of iterations used, I see almost no speed up; with a smaller number, it is around a 1.5 speed up. Is this expected? What can I do to maximize the speed up from MEX? Is this an appropriate example?
Thanks,
Danny

Respuesta aceptada

Jan
Jan el 15 de Jul. de 2012
Editada: Jan el 15 de Jul. de 2012
POW(X,2) is much slower than X*X, because it is a general algorithm, which has the power to calculate POW(X,2.1) also. Therefore James suggestion is much faster than the original method.
A further improvement:
B[0] = S;
for (n = 1; n<I; n++) {
S = 1.0 - 2.0 * S * S;
B[n] = S;
}
0.044 sec.
  3 comentarios
Jan
Jan el 15 de Jul. de 2012
Even mxMalloc replies zeros - to my surprise. At least I did test this 1e10 times with different array sizes and did not get any junk from formerly used memory. But what is the purpose of mxCalloc?
Anyhow, mxCreateDoubleMatrix has a small overhead for creating the header of the variable, e.g. the dimensions, the type, a lot of flags, etc. But look at James FEX: UNINIT. This creates an uninitialized array and is faster under some circumstances. But let me mention that my investigations of when UNINIT is faster for my programs took hours, while it saves less than a second run-time finally. But if you need to squeeze the last microseconds out of your code, it is worth to try.
James Tursa
James Tursa el 16 de Jul. de 2012
Editada: James Tursa el 16 de Jul. de 2012
@Jan: My experience with mxMalloc & mxCalloc matches yours. They both always returned 0's for data. Even when I allocated a very large memory block with mxMalloc, filled it with non-zero data, deallocated it, then immediately allocated again with mxMalloc & got the same address returned, the data had been wiped clean to 0's again. The only way I have been able to avoid this performance hit is with the undocumented API functions such as used in the UNINIT routine. And even here, as you point out, it is only worth doing if for some reason the overall application requires very many large memory block allocations from scratch. But when you do need this capability it is nice to have.

Iniciar sesión para comentar.

Más respuestas (1)

James Tursa
James Tursa el 15 de Jul. de 2012
For starters, get rid of the function call overhead:
B[n] = 1.0-2.0*(B[n-1]*B[n-1]);
You can also play games with *B vs B[etc] and can sometimes also get slight speed improvements ... highly dependent on compiler.
  1 comentario
Jan
Jan el 15 de Jul. de 2012
Editada: Jan el 15 de Jul. de 2012
This reduces the runtime on on Matlab 2009a/64/Win7 from 0.56 sec to 0.061 sec.
The POW methods replies substantially different values than the Matlab version: after the 141th element the values differ by up to 2. With B[n-1]*B[n-1] the values are equal.

Iniciar sesión para comentar.

Categorías

Más información sobre Write C Functions Callable from MATLAB (MEX Files) en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by