My mex file is slower than my original matlab equivalent
10 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Mohammad Shojaei Arani
el 18 de Jul. de 2022
Comentada: Mohammad Shojaei Arani
el 19 de Jul. de 2022
Hello friends,
I need to calculate some quantities of linear algeibra type, so they are merely matrix and vector products. The following is an example
EZ=[(1.0./Ds0.^2.*(Ds0.*(Dm0.*4.0+Dm0.*Ds1.^2.*2.0-Ds0.*(Ds1.*2.0+Dm0.*Ds2.*2.0+Dm1.*Ds1.*2.0-Dm2.*Ds0)+Dm0.*Dm1.*2.0)-Dm0.^2.*Ds1.*2.0))./4.0;
(1.0./Ds0.^2.*(Ds0.*(Ds0.*(Dm1.*4.0+Ds1.^2-Ds0.*Ds2.*2.0+4.0)-Dm0.*Ds1.*8.0)+Dm0.^2.*4.0))./4.0;
(Dm0.*6.0-Ds0.*Ds1.*3.0)./(Ds0.*2.0)];
where Ds0,Ds1,Ds2,Dm0,Dm1,Dm2 are 1*n vectors. When I do the calculations using matlabFunction (attached) it is fast. However, I am not satisfied since I really need to do such calculations thousands of time s(if not millions of times). To overcome this issue I decided to give mex a try. Unfortunately, the equivalent mex file (which I made by matlab coder) is slower 2-3 times (I could not upload it here, unforetunately).
Is there any hope to create a mex file out of this function which is much faster? I hope so!
Thanks for your help in advance,
Babak
7 comentarios
Bruno Luong
el 18 de Jul. de 2022
Editada: Bruno Luong
el 18 de Jul. de 2022
So you don't think simplying expression matters? I do think the contrary. Everytime in the expression there is a gpu array involves there is a whole transfer data from cpu to gpu, you mighte have 200 such terms in your expression, I don't even try to count or understand your code as it is a so unreable and messy expression.
If you a raw unsimplified expression like yours, throw it in the computer and ask why it doesn't accelerate, you need to think a much more lower level how it works.
Respuesta aceptada
Jan
el 18 de Jul. de 2022
Just some experiments. You can gain some clarity, but hardly improve the speed with this simplifications. I've tried a loop version also.
n = 1e4;
Ds0 = rand(1, n);
Ds1 = rand(1, n);
Ds2 = rand(1, n);
Dm0 = rand(1, n);
Dm1 = rand(1, n);
Dm2 = rand(1, n);
tic;
for rep = 1:1e4
EZ = [(1.0./Ds0.^2.*(Ds0.*(Dm0.*4.0+Dm0.*Ds1.^2.*2.0-Ds0.*(Ds1.*2.0+Dm0.*Ds2.*2.0+Dm1.*Ds1.*2.0-Dm2.*Ds0)+Dm0.*Dm1.*2.0)-Dm0.^2.*Ds1.*2.0))./4.0;
(1.0./Ds0.^2.*(Ds0.*(Ds0.*(Dm1.*4.0+Ds1.^2-Ds0.*Ds2.*2.0+4.0)-Dm0.*Ds1.*8.0)+Dm0.^2.*4.0))./4.0;
(Dm0.*6.0-Ds0.*Ds1.*3.0)./(Ds0.*2.0)];
end
toc
tic;
for rep = 1:1e4
Ds0_2 = Ds0 .* Ds0;
Dm0_2 = Dm0 .* Dm0;
EZ2 = [(1 ./ Ds0_2 .* (Ds0 .* (Dm0 * 2 + Dm0 .* Ds1 .^ 2 - ...
Ds0 .* (Ds1 + Dm0 .* Ds2 + Dm1 .* Ds1 - Dm2 .* Ds0 ./ 2) + ...
Dm0 .* Dm1) - Dm0_2 .* Ds1)) / 2; ...
1 ./ Ds0_2 .* (Ds0 .* (Ds0 .* (Dm1 + Ds1 .^ 2 / 4 - Ds0 .* Ds2 / 2 + 1) - ...
Dm0 .* Ds1 * 2) + Dm0_2);
(Dm0 * 3 - Ds0 .* Ds1 * 1.5) ./ Ds0];
end
toc
tic;
for rep = 1:1e4
EZ3 = zeros(3, n);
for k = 1:n
a = Ds0(k);
b = Dm0(k);
c = Ds1(k);
d = Dm1(k);
e = Ds2(k);
EZ3(1, k) = (1 / a^2 * (a * (b * 2 + b * c ^ 2 - ...
a * (c + b * e + d * c - Dm2(k) * a / 2) + b * d) - b^2 * c)) / 2;
EZ3(2, k) = (a * (a * (d + c ^ 2 / 4 - a * e / 2 + 1) - b * c * 2) + b^2) / a^2;
EZ3(3, k) = b * 3 / a - c * 1.5;
end
end
toc
max(abs(EZ(:) - EZ2(:)))
max(abs(EZ(:) - EZ3(:)))
5 comentarios
Jan
el 18 de Jul. de 2022
@Mohammad Shojaei Arani: The rules are straight:
- Avoid repeated work. If a calculation appears repeatedly, compute it once and store it in a temporary variable.
- Reduce the call to expensive functions: exp, power, trigonometric functions, faculty, ...
- Combine operations, but keep in mind, that the result can be influenced by rounding effects. E.g. 1/a*b takes more time than b/a, but the result can be slightly different.
The clarity of the code improves the time needed for debugging:
- Spaces around operators.
- Compact names of variables.
- Be careful with using parentheses, if they are not required.
- Avoid elementwise operators, if the calculation does not need it. 3.0.*2.0 is harder to read then 3 * 2.
Bruno's point is important: The result of numerically instable functions can be influenced massively by simplifications. A basic example:
1e17 + 1 - 1e17
1e17 - 1e17 + 1
Más respuestas (0)
Ver también
Categorías
Más información sobre Write C Functions Callable from MATLAB (MEX Files) en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!