Deleting duplicates based on conditions of multiple columns
32 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Nick
el 28 de Dic. de 2020
Respondida: Akash kumar
el 31 de Jul. de 2022
Hi,
I have a large dataset (100m rows x 40 columns ) and I would like to delete any row that has duplicates on a few specific columns. See example below:
A = [1 10 4; 1 10 4; 1 11 5; 1 11 5; 1 12 6; 1 12 7; 1 13 8; 2 4 25; 2 10 28; 2 10 28; 3 5 33; 4 25 23; 4 23 24];
I would like to delete all rows where the three columns have duplicate within each specific column. So in this example, row 2, 4 and 9 would be deleted because e.g.
row 1 and 2 have duplicates in each of the three columns and so I'd want to delete one of the two (doesn't matter which one).
I suspect the answer is somewhere along the use of unique and logical indexing but haven't managed to figure it out. Any help would be much appreciated. (I'm using Matlab 2018b)
Thanks
3 comentarios
Respuesta aceptada
Más respuestas (1)
Akash kumar
el 31 de Jul. de 2022
% With Index Number:- Shows the which index or Row value is extract from
% the A Matrix. I thinks, It can help you.
A = [1 10 4; 1 10 4; 1 11 5; 1 11 5; 1 12 6; 1 12 7; 1 13 8; 2 4 25; 2 10 28; 2 10 28; 3 5 33; 4 25 23; 4 23 24]';
[B index]=unique(AA(1:3,:).','rows', 'stable')
0 comentarios
Ver también
Categorías
Más información sobre Cell Arrays en Help Center y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!