Inlined code segment slower than internal function pass - why?

1 visualización (últimos 30 días)
Carson Purnell
Carson Purnell el 10 de Feb. de 2023
Editada: Matt J el 12 de Feb. de 2023
I'm trying to speed up prototype code and have found a strange instance of speed increase when replacing standard inlined code i'm using inside of a loop. The inlined code is as follows:
s=0;
for dd=1:numel(loc)
s=s+(dynpts(:,dd)-loc(dd)).^2;
end
fidx=s<sel.rad^2;
ix=find(fidx);
Somehow, this is 2-3x slower in profiling than making it an in-script subfunction call: ix = rangesearchnest(loc,sel.rad,dynpts); with an identical body (different variable names). I don't know how this could be the case for any circumstance - my understanding that JIT and internal optimizations should work on the inlined code better than external calls. However, dynpts is a nx3 array where n is in the millions to billions so I was expecting a tremendous speed increase with the inlined version merely as a result of not needing to pass the gargantuan array as an argument (and potential memory limit issues).
Is there special case behavior i'm not aware of happening here?
  1 comentario
Walter Roberson
Walter Roberson el 11 de Feb. de 2023
x = find(s<sel.rad^2);
is potentially better optimized then the two-statement version.
In numeric cases where the < ordering is guaranteed not to return errors, then potentially MATLAB could run s(K)<sel.rad^2 in a loop gathering indices as it went (perhaps into a linked list) instead of first calculating s and sel.rad^2 as logical vectors and then doing a find() operation on the result
In order to determine whether it does that kind of operation, you would probably need to use large matrices, right on the boundary, where calculating s<sel.rad^2 first would exhaust your memory.
The language model is to calculate the logical vector first, but in most languages, internal optimizations are permitted to vary order of operations provided that the result is the same when no exceptions occur.

Iniciar sesión para comentar.

Respuestas (1)

Matt J
Matt J el 10 de Feb. de 2023
Editada: Matt J el 10 de Feb. de 2023
with the inlined version merely as a result of not needing to pass the gargantuan array as an argument (and potential memory limit issues).
Passing a variable to a function does not result in any memory copying unless the function makes changes to the variable, which you are not doing. Also, my recollection of how the JIT works is that it optimizes the execution of functions, but not scripts. So, if your top level code is not enclosed ina function, that might be part of it as well.
  3 comentarios
Carson Purnell
Carson Purnell el 12 de Feb. de 2023
Alright, after some more testing things... are still confusing. I tried a vectorized solution both inlined and as a function: (note that sel.rad is a scalar, not a vector as walter appeared to assume)
s = sum((dynpts-loc).^2,2);
ix5=find(s<sel.rad^2)
This did increase the speed in the script - and slowed down the function to near identical speeds (function overhead margin, probably). So the script loop was slowest, the vector was fast either way, but the loop in the function is still faster. That does make it look like the script prevented optimization of the loop - but does not explain why the external loop is faster. Maybe 1xm vector math is faster than doing things in the array?
Good to know that arguments don't need full memory under conditions.
Matt J
Matt J el 12 de Feb. de 2023
Editada: Matt J el 12 de Feb. de 2023
I don't know what you mean by the "external loop", but the tests below seem consistent with the rest of your comment. None if it is too surprising, IMHO. The vectorized version allocates the most memory, so it makes sense to me that the loop is fastest when full optimizations are applied.
n=1e7;
[dynpts,loc]=deal(rand(n,3),rand(1,3));
timeit(@()implem1(dynpts,loc))
ans = 0.0320
timeit(@()implem2(dynpts,loc))
ans = 0.0888
tic;
s=0;
for dd=1:numel(loc)
s=s+(dynpts(:,dd)-loc(dd)).^2;
end
toc
Elapsed time is 0.112159 seconds.
tic
s = sum((dynpts-loc).^2,2);
toc
Elapsed time is 0.090275 seconds.
function implem1(dynpts,loc)
s=0;
for dd=1:numel(loc)
s=s+(dynpts(:,dd)-loc(dd)).^2;
end
end
function implem2(dynpts,loc)
s = sum((dynpts-loc).^2,2);
end

Iniciar sesión para comentar.

Categorías

Más información sobre Function Creation en Help Center y File Exchange.

Productos

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by