speed up intersect, setdiff functions

Hi
I have some code, which when I examine using the Matlab Profiler shows that the majority of the time is spent on the intersect and setdiff functions, which are invoked many times during a looping process. I was wondering if there was a clever way to speed this up? For example is it possible to somehow vectorize the following code to avoid invoking the inbuilt set functions? Thanks for your help!
tic;
size=50000;
a=round(100*rand(size,1));
b=round(100*rand(size,1));
intersect(a,b);
setdiff(a,b);
toc;

1 comentario

Matt Fig
Matt Fig el 15 de Nov. de 2012
Your best bet might be to avoid performing these operations in a loop in the first place...

Iniciar sesión para comentar.

 Respuesta aceptada

Jan
Jan el 15 de Nov. de 2012

1 voto

intersect and setdiff both sort the inputs. Therefore you can omit at least one sorting, when you copy the relevant parts of both function to a new file. Then ismembc will do its work once, as Sean has suggested already.

Más respuestas (2)

Jonathan Epperl
Jonathan Epperl el 15 de Nov. de 2012
If your sets aren't much larger than your example, and are indeed made up of positive integers, then those two functions here can get you a drastic increase in speed. The idea isn't mine, afaik Kevin Murphy of the Bayes Net TB came up with it. Anyway, 2 functions which pretty much exactly replace setdiff and intersect (as long as you work with positive integers):
function C = MY_intersect(A,B)
if ~isempty(A)&&~isempty(B)
P = zeros(1, max(max(A),max(B)) ) ;
P(A) = 1;
C = B(logical(P(B)));
else
C = [];
end
and
function Z = MY_setdiff(X,Y)
if ~isempty(X)&&~isempty(Y)
check = false(1, max(max(X), max(Y)));
check(X) = true;
check(Y) = false;
Z = X(check(X));
else
Z = X;
end
Obviously, if you're confident your inputs will make sense you can get rid of the if stuff.
tic;
size=50000;
a=round(100*rand(size,1));
b=round(100*rand(size,1));
intersect(a,b);
setdiff(a,b);
toc;
% Elapsed time is 0.028182 seconds.
tic;
size=50000;
a=round(100*rand(size,1));
b=round(100*rand(size,1));
MY_intersect(1+a,1+b)-1;
MY_setdiff(1+a,1+b)-1;
toc;
% Elapsed time is 0.016935 seconds.
The computation times vary a lot though, but the 2nd block is always at least around 30% faster. You can see how it plays out in your code and let us know.

3 comentarios

Jdeen
Jdeen el 21 de Mayo de 2014
Editada: Jdeen el 21 de Mayo de 2014
Unfortunately, MY_setdiff does not quite behave the same as setdiff, also not for positive integers. Consider: setdiff([1 2 2 1], [1]) = 2 and MY_setdiff([1 2 2 1], [1]) = 2 2. Hence, you would have to include a Z = unique(Z) which reduces speed. However, MY_setdiff generally still seems to be faster for pos. integers
Jonathan Epperl
Jonathan Epperl el 18 de Ag. de 2014
That's a good point, I never realized that.
Chinmaya KA
Chinmaya KA el 9 de Sept. de 2020
Hi Jonathan, how can we intersect multiple arrays with positive integers using your code?

Iniciar sesión para comentar.

Sean de Wolski
Sean de Wolski el 15 de Nov. de 2012

0 votos

Look at ismember and ismembc (two output form).
You will have to implement the and/or/xor logic yourself, but these should quickly give you the indices you need to do so.

1 comentario

Ethan Kyzivat
Ethan Kyzivat el 9 de Mayo de 2018
I found that ismember also calls unique(), so may not be much faster...

Iniciar sesión para comentar.

Categorías

Preguntada:

el 15 de Nov. de 2012

Comentada:

el 9 de Sept. de 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by