Remove duplicate rows in CSV file

hello dear mathworkers,
I have a dataset consist of approximatlly 4 millions records, and i want to remove the duplicated rows or records, can any one help me with the way, i am using matlab 2018a . thanks in advance

7 comentarios

madhan ravi
madhan ravi el 23 de Jul. de 2019
Upload a sample file.
Alex Mcaulley
Alex Mcaulley el 23 de Jul. de 2019
You can use unique with 'rows' option
mohammad Alsajri
mohammad Alsajri el 23 de Jul. de 2019
Editada: mohammad Alsajri el 23 de Jul. de 2019
dear madhan ravi this is a sample of the whol data
Shameer Parmar
Shameer Parmar el 23 de Jul. de 2019
@Mohammad Alsajri: You mentioned that the rows are duplicated, but in you sample sheet, I tried searching for duplicated data, but I could not find duplicated data..
For example, if I filtered the data for value '215' (at column E), still I dont see the duplicated value at other columns like column 'W' and 'X'..
Capture.JPG
So let us know.. what is mean by duplicate entries and how to consider it..
mohammad Alsajri
mohammad Alsajri el 24 de Jul. de 2019
@Shameer Parmar duplicate means the entire row match another row for all columns , this is just sample it is 4 million records so of cours there is a duplicated rows
madhan ravi
madhan ravi el 24 de Jul. de 2019
Mohammed: Alex's solution should have solved your problem.
mohammad Alsajri
mohammad Alsajri el 25 de Jul. de 2019
thanks for help guys

Iniciar sesión para comentar.

 Respuesta aceptada

Alex Mcaulley
Alex Mcaulley el 23 de Jul. de 2019
Since all is numeric data, you can use:
data = xlsread('kdd.xlsx');
datanew = unique(data,'rows');

2 comentarios

Shameer Parmar
Shameer Parmar el 23 de Jul. de 2019
This is not working, because non of data is similar.. I dont find duplicate entries in this sheet provided by Mohammad Alsajri..
using your command, the 'data' and 'datanew' both are getting exact same..
Alex Mcaulley
Alex Mcaulley el 23 de Jul. de 2019
This code works!
I guess the excel provided by Mohammad is just a small portion of the dataset (4 million of rows).

Iniciar sesión para comentar.

Más respuestas (0)

Preguntada:

el 23 de Jul. de 2019

Comentada:

el 25 de Jul. de 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by