Matlab uses much more memory reading a file

2 visualizaciones (últimos 30 días)
Changyue Song
Changyue Song el 24 de Abr. de 2017
Comentada: Walter Roberson el 26 de Abr. de 2017
Hi, I have a csv file with 22 columns and 871,000 rows. Columns are separated by comma. Each value is a text or number embraced by quote. The size of the file is about 150 MB. But after I read the file into MATLAB using textscan, the variable which stores the data takes up about 2GB of memory! For another csv file with 47 columns and 7,000,000 rows which have similar structure and is about 2GB in hard disk, MATLAB takes forever to read it using textscan. However, R is able to read these two files and the memory used is approximately the same as the space needed in hard disk. Are there any explanations for this? Thanks.
PS: Thank you for your help guys. Yes I can process the files line by line and discard the columns that I do not need. I am just curious why Matlab uses so much more memory than R or than the original file in the hard disk.
  8 comentarios
per isakson
per isakson el 26 de Abr. de 2017
Editada: per isakson el 26 de Abr. de 2017
You write "Each value is a text or number embraced by quote." Given the quotes, there is no alternative to your format specifier - afaik.
Recently Matlab has become better to handle large files and strings. Which release do you use?
Are the sizes of the different strings approximately equal?
Walter Roberson
Walter Roberson el 26 de Abr. de 2017
If there are fields known to be numeric but enclose by quotes anyhow, then you can code a literal " character in the textscan format.

Iniciar sesión para comentar.

Respuestas (1)

AstroGuy1984
AstroGuy1984 el 25 de Abr. de 2017
Do you need to read the whole matrix at once or are you just doing a routine line by line? If you are just running something on each line, I'd suggest NOT reading the whole file at once and doing something like:
fid = fopen('file.dat');
tline = fgetl(fid);
while ischar(tline)
% STUFF YOU WANT TO DO
tline = fgetl(fid);
end
fclose(fid); %DONT FOGET THIS!
Unfortunately, handling huge data files you eventually have to get creative in how you handle things. I would also suggest looking into using properties such as fseek and frewind if you're needing to pull out specific entries from a huge database. It will save you a ton of time and effort if you're just referencing the textfile for something.

Categorías

Más información sobre Text Data Preparation en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by