Binary, ASCII and Compression Algorithms

10 visualizaciones (últimos 30 días)
Matlab2010 el 11 de Jul. de 2014
Editada: José-Luis el 14 de Jul. de 2014
I have a large number (> 1E6) of ASCII files (myFile.txt) which contain time series data, all in the same format: timestamp, field 1, field 2,...,field 20. Each data entry is one row, tab separated. Each of the fields 2-20 is a double. The timestamp is string (HH:MM:SS.FFF). The files are each c. 5GB in size.
I wish to reduce the hard disk storage required. How can I do this?
My thoughts so far are
1. Convert the files to binary format. How can I do this? Is it by applying dec2bin.m? However this function seems to only take scalars. What would this look like?
2. Compress each file. Each file may be used independently of the others, thus I wish to compress individually. I know that differing approaches to compression work differently for different data structures. Given my data structure above, which is the best one to apply?
Given the importance of this, I would be happy calling other language files from inside matlab (eg C++). Any standard libraries/ third party tools that can be recommended?
3. Any other suggestions?
Finally, an important point is that I wish the user to be able to quickly load and access the data in each file - ie the bin2dec() call must be quick as must be the decompression.
thank you!
  3 comentarios
Matlab2010 el 14 de Jul. de 2014
Editada: Matlab2010 el 14 de Jul. de 2014
I don't want to use a database due to I/O costs.
The binary files would contain no text as I would convert the timestamps to java format (eg using datenum.m).
José-Luis el 14 de Jul. de 2014
Editada: José-Luis el 14 de Jul. de 2014
I would use a database. Which one is mostly down to personal preferences and constraints. I like mysql because it's free.
Depending on how your data looks like, you could use the netcdf: format. It has support to be read/written in Matlab. The same is true for hdf5 . These are sort of lightweight databases though.
IMO, io through a database would be faster than wading through the mountain of files you have, unless you plan on hard-coding file paths. I haven't tested it though so that's not a definite.

Iniciar sesión para comentar.

Respuestas (1)

Star Strider
Star Strider el 11 de Jul. de 2014
I would read them in as text files, save them as ‘.mat’ files (in the default binary format), then delete the text files. Since the ‘.mat’ files have a different suffix/extension, the prefix name can be the same as for the text file. See the documentation for save and load for details.
  2 comentarios
Matlab2010 el 11 de Jul. de 2014
1. I would like to be able to access the data from Python and R as well as matlab.
2. Does compressing mat files help much? eg zip.m
Star Strider
Star Strider el 11 de Jul. de 2014
Editada: Star Strider el 11 de Jul. de 2014
  1. If you want to access the files from other applications, your best option would be to go with something other than .mat files, since to the best of my knowledge, those are MATLAB-specific. I’m not familiar with the file types Python and R can read and write, so you would need to find a common, space-efficient file format for all three applications.
  2. Compressing them would help. You probably have to go that route anyway, considering the sizes of the files.

Iniciar sesión para comentar.


Más información sobre Call Python from MATLAB en Help Center y File Exchange.


Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by