Save a large array into equal length .csv files?
12 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Hi Guys, I am trying to save an adjusted very large data set into equal length .csv files. I am using the following script from this link with my own database:
%%Step 1 - create a tall table
varnames = {'ArrDelay', 'DepDelay', 'Origin', 'Dest'};
ds1 = datastore('airlinesmall.csv', 'TreatAsMissing', 'NA', ...
'SelectedVariableNames', varnames);
tt = tall(ds1);
%%Step 2 - operate on tall table
tt.TotalDelay = tt.ArrDelay + tt.DepDelay;
%%Step 3 - use tall/write to emit .mat files
writeDir = tempname
mkdir(writeDir);
write(writeDir, tt);
%%Step 4 - use parfor to parallelise the writetable loop
ds = datastore(writeDir);
N = numpartitions(ds, gcp);
csvDir2 = tempname
mkdir(csvDir2);
parfor idx1 = 1 : N
idx2 = 0;
subds = partition(ds, N, idx1);
while hasdata(subds)
idx2 = 1 + idx2;
fname = fullfile(csvDir2, sprintf('out_%06d_%06d.csv', idx1, idx2));
writetable(read(subds), fname);
end
end
I am adapting the script in step 4 to the following in order to specify that each .csv file has 20000 rows:
RequiredDataRowsPerFile = 20000;
ds = datastore(writeDir,'ReadSize',RequiredDataRowsPerFile);
It works to some degree as there is an impact; however, the outcome does not generate an equal distribution of .csv files in terms of number of rows (of course the last file will always be different).
I would appreciate any help. Thanks
Tim
0 comentarios
Respuestas (0)
Ver también
Categorías
Más información sobre Large Files and Big Data en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!