How to find which elements for each row of matrix B (NxM) are contained in matrix A (NxK), in their corresponding row n using vectorization?

3 visualizaciones (últimos 30 días)
How to find which elements for each row of matrix B (NxM) are contained in matrix A (NxK), in their corresponding row n?
Toy example:
A = [1 2 3; 4 5 6; 7 8 9]
A =
1 2 3
4 5 6
7 8 9
B = [0 7 9 1 8; 1 15 5 15 4; 1 2 20 5 9]
B =
0 7 9 1 8
1 15 5 15 4
1 2 20 5 9
Which elements of B are contained in A (restricted to their corresponding row?): The answer should be a matrix C of same dimensions as B (NxM)
C =
0 0 0 1 0
0 0 1 0 1
0 0 0 0 1
For each column vector of matrix B, it should check if it is contained in A, but only row-wise. For instance, although B(2,1)=1 is contained in A(1,1), they belong to different rows (row 2 vs row 1).
The only solution I have is to use a sum inside a for loop:
C = [];
for i = 1:size(b, 2)
C = [C sum(b(:, i) == a, 2)];
end
C
C =
0 0 0 1 0
0 0 1 0 1
0 0 0 0 1
However, for very big matrices this is very slow. Is there a way I can vectorize this?
PS: because of a warning ("warning: mx_el_eq: automatic broadcasting operation applied"), I had started using sum(bsxfun(@eq, B(:, 4), A), 2) instead of sum(B(:, i) == A, 2), but running a small tic toc experiment it seems that the latter form without bsxfun runs faster and therefore is preferred?
  1 comentario
Stephen23
Stephen23 el 24 de Feb. de 2016
Editada: Stephen23 el 24 de Feb. de 2016
@George Aipal: Your method of expanding C on every iteration is very memory inefficient. Because you concatenate (and expand) the array on each iteration MATLAB has to check its size and move it to larger memory when required. Slow! This is a topic that has been covered many times on this forum (and many others). Use your favorite internet search engine to find the thousands of explanations why this is slow code and how to fix it: search for "MATLAB array preallocation".

Iniciar sesión para comentar.

Respuesta aceptada

Stephen23
Stephen23 el 24 de Feb. de 2016
Editada: Stephen23 el 24 de Feb. de 2016
No loops are required.
Method one: permute
>> any(bsxfun(@eq,B,permute(A,[1,3,2])),3)
ans =
0 0 0 1 0
0 0 1 0 1
0 0 0 0 1
Method two: reshape
>> any(bsxfun(@eq,B,reshape(A,size(A,1),1,[])),3)
ans =
0 0 0 1 0
0 0 1 0 1
0 0 0 0 1
  6 comentarios
Guillaume
Guillaume el 27 de Feb. de 2016
Well, not exactly like your original code which wasn't efficient. You'll have to fall back to the loop method in my answer.
George Aipal
George Aipal el 27 de Feb. de 2016
Editada: George Aipal el 27 de Feb. de 2016
Thanks Stephen. I was wondering if there was a way to avoid the temporary big matrix that gets created by 'bsxfun' before it gets reduced by 'any', because if there was a different way, then in most cases your vectorized code would work greatly in my problem in most cases Matrix A is about 12540x1x340, matrix B is roughly 12540xN (with N being from 3000 to 30k or up to 60k), but the bsx function generates a matrix of 12540xNx340, and this is the unmanageable one. I understand from you there is no other vectorized implementation that would avoid creating the 12540xNx340 matrix (before it gets reduced by 'any', therefore I will have to use the for loop, otherwise please let me know. Thanks!
PS: and thanks Guillaume, I will certainly preallocate the matrix and try your code too!

Iniciar sesión para comentar.

Más respuestas (2)

Guillaume
Guillaume el 24 de Feb. de 2016
I don't think there's going to be a faster way than iterating over the rows to do what you want. The way you've constructed your loop is very inefficient though, for two reasons:
  • you don't preallocate the output C and instead grow it on each loop iteration, meaning that matlab allocates a new array and copy the old one on each iteration. A big cause of slowdowns for large arrays.
  • the way you test for membership is inefficient. Use ismember on rows of A and B instead of iterating over the columns of B and comparing to the whole A matrix.
So:
C = zeros(size(B)); %preallocation. no reallocation in the loop
for rowidx = 1:size(B, 1)
C(rowidx, :) = ismember(B(rowidx, :), A(rowidx, :)); %much faster membership test
end
You could also rewrite the above as the one-liner (no preallocation needed):
C = cell2mat(cellfun(@ismember, num2cell(B, 2), num2cell(A, 2), 'UniformOutput', false))
No guarantee it will be faster than the explicit loop.
  3 comentarios
Guillaume
Guillaume el 24 de Feb. de 2016
D'oh! Of course, there is a way to do it without a loop. See Stephen's answer.
My comments on the inefficiencies of your loop still stand.

Iniciar sesión para comentar.


George Aipal
George Aipal el 27 de Feb. de 2016
I should probably post my question as a new comment: although I accepted the solution provided by Stephen as it worked nicely in the toy example, once I tried it on the real data it did not work, since that solution temporarily generates an immensely large matrix before reducing its size at the end: "this solution made me run out of memory. After some analysis, it seems that the problem lies in the bsxfun part, this creates a temporarily massive matrix with a number of elements in the order of 2e+11 in it. Then the 'any(bsxfun, 3)' part reduces it to a manageable size, but the temporarily huge matrix generated by bsxfun is just too big. Is there a more memory efficient way to avoid creating the temporary massive matrix that bsxfun generates?" Stephen's solution was: option 1: any(bsxfun(@eq,B,permute(A,[1,3,2])),3) option 2: any(bsxfun(@eq,B,reshape(A,size(A,1),1,[])),3)

Categorías

Más información sobre Logical en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by