Read big file with mixed data types with datastore
33 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Sy Dat Ho
el 25 de Nov. de 2024 a las 13:30
Comentada: Walter Roberson
el 29 de Nov. de 2024 a las 20:44
I've got a file which is 300 GB big. A piece of it can be found in the attached file. I've read that the best way to handle this kind of files is to read them into a datastore.
As you can see, the first two lines are characters, while the following lines are a combination of floats and integers. Is it possible to read them predefined? I know from fscanf that you can specify the data type, but when I do datastore it interprets every line as a string.
0 comentarios
Respuestas (1)
Stephen23
el 25 de Nov. de 2024 a las 14:05
ds = datastore('./*.txt', 'Type','tabulartext', 'NumHeaderLines',2, 'TextscanFormats',repmat("%f",1,5));
T = preview(ds)
5 comentarios
Stephen23
el 29 de Nov. de 2024 a las 17:14
Editada: Stephen23
el 29 de Nov. de 2024 a las 17:29
FOPEN does not read a file into RAM.
Of course the details are likely more nuanced than that, possibly a small part of the file is loaded and other parts in virtual memory. But in any case, I doubt that there is any implementation of FOPEN in any language that would load an entire file when FOPEN is called. That would be a terrible way to implement FOPEN.
Walter Roberson
el 29 de Nov. de 2024 a las 20:44
i can't use fopen bc my ram is smaller than the file.
Replace
fid = fopen('test.txt','rt');
with
fid = fopen('test.txt','rt','n','US-ASCII');
The fact that you supplied the text encoding will keep the first fgetl() from scanning through the file trying to guess the file encoding. It will just leave the file positioned at the beginning, ready to read piece by piece. It will not need to buffer the file in memory.
Ver también
Categorías
Más información sobre Large Files and Big Data en Help Center y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!