Textscan with very large .dat files, Matlab keeps crashing
2 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
mashtine
el 24 de Abr. de 2014
Editada: Jeremy Hughes
el 13 de Mzo. de 2017
Hey everyone,
I am using R2013b to read in some very large files, 35 of them but I am running into memory problems and matlab usually crashes before loading the files in. I am using textscan but I was hoping someone could help me edit the code so that it will load in the data a block at a time or at least make it less memory intensive. I need all the years in one large cell array.
Any ideas?
Many thanks!
tic;
HWFiles = {'midas_wind_197901-197912.txt','midas_wind_198001-198012.txt', ..........(up to 2013)};
HWData = cell(1,numel(HWFiles));
for i=1:numel(HWFiles);
fid = fopen(HWFiles{i}, 'r');
tmp = textscan(fid,'%s %*s %*f %*f %s %*f %f %*f %f %f %f %f %f %*f %*f %*f %*f %*f %*s %*f %*f %*s %*f %*f', 'Delimiter',',');
HWData{i} = tmp ;
fclose(fid);
end
toc;
0 comentarios
Respuesta aceptada
Walter Roberson
el 24 de Abr. de 2014
Run through the files, reading them with textscan(), but instead of storing the results all in memory, use the matFile class to append the new data to the end of a variable in a .mat file.
Once that is done, you can start a new MATLAB session and load() the .mat file to get the combined cell array.
Más respuestas (3)
per isakson
el 24 de Abr. de 2014
Editada: per isakson
el 24 de Abr. de 2014
- "load in the data a block at a time" . Block is that a part of a file?
- converting from double to single saves on memory
- your format specifier shows that you already skip many columns, "%*f"
- "very large files" . Do these files contain hourly weather data? How large are they?
With textscan you can read N lines at a time
C = textscan( fileID, formatSpec, N )
But that will probably not help.
I imagine there are many possibilities to decrease the requirement for memory. However,
- what data do you need to have simultaneously in memory to do the calculations?
- could the files be downloaded from the net? Or could you attach a file to the question?
8 comentarios
José-Luis
el 25 de Abr. de 2014
Editada: José-Luis
el 25 de Abr. de 2014
Sure, HDF is fine, as long as you don't need relational capabilities in your database. That seems to be the case for the op.
By database program I meant whatever is designed to handle large amounts of data. I have no idea how the bindings are between Matlab and HDF since I have never tried them. Only NetCDF, a long time ago. I was not impressed.
What I meant by my comment is that you really shouldn't use Matlab to store and handle large amounts of data. It will be slow and your computer will choke really fast.
per isakson
el 25 de Abr. de 2014
Editada: per isakson
el 27 de Abr. de 2014
"designed to handle large amounts of data" . HDF5 complies to that definition. HDF5 is a unique technology suite that makes possible the management of extremely large and complex data collections.
To discuss data storage, we need a better description of the use case than the one provided by OP.
I did an evaluation regarding storing time series from building automation systems and I settled on HDF5. And I'm happy with that choice.
Justin
el 24 de Abr. de 2014
Editada: Justin
el 24 de Abr. de 2014
One thing that might help is increasing the Java Heap Memory. Go into the Home tab > Preferences > General > Java Heap Memory
The default is 128, try something conservative first such as 256. If you set it too high you will have to manually edit some config files before Matlab can start again so increment it slowly.
Another option is doing your analysis on each file separately or pulling out only the needed data from each file one at a time so the entire contents of all the files do not need to remain in memory.
EDIT:
0 comentarios
Jeremy Hughes
el 13 de Mzo. de 2017
Editada: Jeremy Hughes
el 13 de Mzo. de 2017
Hi, If you can access R2014b or later, I'd recommend using DATASTORE to manage your import. It automatically breaks up files into blocks and manages multiple files.
ds = datastore(folder)
% List the names you want to import. e.g. ds.SelectedVariableNames = ds.VariableNames([1 3 5]);
ds.SelectedVariableNames = ...;
while hasdata(ds)
t = read(DS);% returns a table with the data for the current block.
% do stuff
end
This should do what you need. https://www.mathworks.com/help/matlab/datastore.html
0 comentarios
Ver también
Categorías
Más información sobre HDF5 en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!