Efficient Row comparison large dataset

I have 2 arrays: array 1 consists of 'a|b|c' rows and array 2 consist rows containing a string, for example 'ut'. Both arrays have a length of 200.000+ rows and have equal length.
From array 1, I need to filter values which have the same third value but a different first value for the same string in array 2.
for example: row 1: 'a|b|c' and 'x' row 2: 'f|g|c' and 'y' row 3: 'a|b|c' and 'y' row 4: 'd|e|c' and 'x'
In this situation i would like to delete row 4, because 'c' and 'x' are the same for both rows, but 'a' and 'd' are different. All other rows do not fit these demands and won't be deleted.
It is possible to write a for loop and compare each row separately, however this process takes days (I tested).
Any help would be greatly appreciated.

Respuestas (2)

John D'Errico
John D'Errico el 3 de Nov. de 2014

0 votos

help unique
This will serve your needs perfectly, and very efficiently.
Paul
Paul el 3 de Nov. de 2014

0 votos

Thank you for your fast answer.
Is it possible to use unique(..) to find the unique combinations of rows? Because then I would like to find unique combinations of the third value of array 1 ('a'|'b'|'this') and the the value of array 2 on the same row.
When I tried a simple set:
s = {'ut','rtd';'hg','ry';'ut','rtd'}
[r,i,j] = unique(s,'rows')
This gives the unique strings ('hg','rtd','ry','ut'), however I need the combinations (row1:'ut' and 'rtd', row2: 'hg' and 'ry')
Is this possible?

Categorías

Preguntada:

el 3 de Nov. de 2014

Respondida:

el 3 de Nov. de 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by