Separating data from a CSV file

19 views (last 30 days)
James
James on 3 Mar 2020
Commented: Star Strider on 3 Mar 2020
Hello,
I have a huge CSV file and it has data stored as follows.
Column A stores ID numbers and the Column B stores dates (yyyymmdd hhmmss). Other columns are random.
I want to seperate the data based on ID numbers (Column A) and store them in different *.csv or *.txt files. The problem is that the number of data sets per one ID number is also random.
Ex: ID 102030 has 3 data while 125125 has 4 data
My CSV file has over 10,000 data rows, which makes is quite difficult to do this manually.
Is there a way to use MATLAB to sort this out? I know this is a big task but it would be a great help if anyone can at least point me in the right direction.
Thank you in advance!

Accepted Answer

Star Strider
Star Strider on 3 Mar 2020
It woiuld be necessary to convert column B to strings in order for datetime to parse it, however that is not difficult. Grouping them is strraightforward, using either findgroups or the third output of unique. The accumarray function might be the best way to segregate them.
Example —
a = rand(10, 5); % Create Matrix
A = [[3 3 3 4 4 6 6 6 6 7]' a]; % Create Matrix
G = findgroups(A(:,1));
M = accumarray(G, (1:size(A,1)).', [], @(x){A(x,:)});
Cell_1 = [M{1}]
Cell_3 = [M{3}]
This does not include the datetime calls, howefver that would be straightforward if the date and time numbers needed to be converted.
  5 Comments
Star Strider
Star Strider on 3 Mar 2020
As always, my pleasure!
That should not make any difference, even if the ‘ID’ values are not contiguous in the original file. The tables will have them in the order they appear in the original file, and the individual files will be the same as the cell arrays accumarray created, that are then created as tables.
Note that the tables also give you the ability to label the columns (variables) to be what you want. See the documentation on table (and associated documentation) for those options.
Also, I would put them all in the same folder/subdirectory so you can find them easily. Use fullfile to create the appropriate filenames (including the subdirectory path), and use addpath to add that path to your MATLAB search path. See What Is the MATLAB Search Path? for those details, if you are not familiar with them.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by