Read/Write large CSV file

24 visualizaciones (últimos 30 días)
Atif Shah
Atif Shah el 9 de Jul. de 2018
Comentada: Jan el 11 de Jul. de 2018
I write a matrix of size 1721x196609 to CSV file using csvwrite. Now, when I read this file using csvread command its give me the array of 338364089x1, While I need the original size 1721x196609. However, when I reduce the matrix size to 1721x96000 which is almost the half, it works perfectly. My question is, how I can get the original size matrix when I read the csv file?
Thank you in advance.
  2 comentarios
OCDER
OCDER el 9 de Jul. de 2018
Can you show us the code? It's odd that csvwrite & csvread will work differently based on the matrix size.
Atif Shah
Atif Shah el 9 de Jul. de 2018
I am using the following code for reading and writing these matrices M.
csvwrite('csvLargeFile.csv', M);
read_matrix = csvread('csvLargeFile.csv');

Iniciar sesión para comentar.

Respuesta aceptada

OCDER
OCDER el 9 de Jul. de 2018
Editada: OCDER el 9 de Jul. de 2018
Instead of saving as csv, perhaps saving it as a binary file would be better for transporting data - unless, a human is going to read this data manually...
Try this:
M = zeros(1721, 196609);
FileName = 'csvLargeFile.dat';
%To write
FID = fopen(FileName, 'w');
fwrite(FID, M, 'double');
fclose(FID);
%To load
FID = fopen(FileName, 'r');
A = fread(FID, [1721, Inf], 'double');
fclose(FID);
  3 comentarios
OCDER
OCDER el 9 de Jul. de 2018
Another way is to make a custom file format that stores the file size as the first 2 double of the stream file.
FID = fopen(FileName, 'w');
fwrite(FID, size(M), 'double'); %First 2 double is the size of the matrix
fwrite(FID, M, 'double');
fclose(FID);
FID = fopen(FileName, 'r');
Size = fread(FID, 2, 'double'); %Get the first 2 double and assume it's the size
A = fread(FID, [Size(1), Size(2)], 'double');
fclose(FID);
But yes, the .mat file would be best for transporting data acrross matlab sessions. Would need the '-v7.3' option in this case for >2GB matrix.
save('myLargeMatrix.mat', M, '-v7.3')
Atif Shah
Atif Shah el 10 de Jul. de 2018
Editada: Atif Shah el 10 de Jul. de 2018
Thank you! Yes, it's better to save as mat files.

Iniciar sesión para comentar.

Más respuestas (1)

dpb
dpb el 9 de Jul. de 2018
Editada: dpb el 9 de Jul. de 2018
Confirmed behavior w/ R2017b; it's an issue with record length and textscan it appears...I didn't explore just where it actually breaks.
csvread simply calls dlmread with the comma delimiter and dlmread uses textscan internally with the default empty format string which normally will return the array shape as found in the file.
Looks like time for bug report...apparently internal logic has some line limitation in record size.
xlsread returns right data for the subsection it reads but only a 2x16384 subset. That's in the COM engine so not a reportable bug to TMW; I don't know what modern Excel lengths are; I thought they had been moved up to 32-bit but whether that really works or not I didn't try directly.
venerable textread is trying but hasn't yet returned to command prompt after a couple minutes...
One could try specific format string in textscan and see if that's a workaround; of course that presumes one know the record count a priori. One could scan a first record and determine that by using fgetl and sum(fgetl(fid)==',') to count delimiters and reshape based on return.
ADDENDUM Had to force-close ML to terminate textread...post that, explicit use of textscan shows--
fmt=repmat('%f',1,length(x)); % x=rand(2,196609);
fid=fopen('atif.csv');
y=cell2mat(textscan(fid,fmt,'delimiter',',','collectoutput',1));
whos y
Name Size Bytes Class Attributes
y 2x196609 3145744 double
Explicit format string works as expected
frewind(fid)
y=cell2mat(textscan(fid,'','delimiter',',','collectoutput',1));
whos y
Name Size Bytes Class Attributes
y 393218x1 3145744 double
Problem is in the internal default (and afaict currently undocumented although used to be in an example) of no explicit format string returning the shape of the input file breaking at some undetermined record length.
fid=fclose(fid);
  1 comentario
Jan
Jan el 11 de Jul. de 2018
Atif Shah wrote:
Thank you for nice explanation.
@Atif Shah: Please use flags only to inform admins and editors about inappropriate content like spam or rudeness.

Iniciar sesión para comentar.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by