Borrar filtros
Borrar filtros

Variable cannot be saved to MAT-file whose version is older than 7.3

16 visualizaciones (últimos 30 días)
I have an ASCII data file (14,4 MB), which I convert into *.mat with MATLAB (with my own written function).
-The *.mat file then gets 15,8 MB and contains:
- a cell array (950000x17 cell) with a string in each cell
- a numerical array (950000x5 double).
The *.mat file has 15,8 MB. There is no problem with the use of the 'save' function.
Problem: I have another ASCII data file (22,2 MB), which I try to convert into *.mat with MATLAB and try to save these two arrays:
- a cell array 'text_data' (1400000x17) with a string in each cell
- a numerical array (1400000x5 double).
Then I get the following error message: "Warning: Variable 'text_data' cannot be saved to a MAT-file whose version is older than 7.3. To save this variable, use the -v7.3 switch. Skipping...".
When I use 'save(filename,...,-v7.3)' I get an *.mat-file of 6,48 GB!
Questions: Why does it fail to save the data content of the 22,2MB ASCII file, while it is capable to save the data content of a 14,4 MB ASCII data file. Why is the *.mat-file with -v7.3 switch that disproportionately huge!
I work with an 64-bit Computer (WIN7) and MATLAB version r2011b.

Respuesta aceptada

Jan
Jan el 22 de Nov. de 2012
Editada: Jan el 22 de Nov. de 2012
Just to be sure: You read a 22.2MB ASCII file into a cell string, which occupies 2,2GB of RAM afterwards? A {1400000 x 17} cell string contains the strings using 2 bytes per character, and 23800000 headers for the CHAR arrays, which use about 100 Bytes per variable. This is an overhead of about 2.4 GB, such that the actual data can be neglected. Obviously large cell strings are not represented efficiently in Matlab. It would be tremendously more efficient to store 1 string and a list of indices of the start positions (which imply the end positions also). The conversion between the different representations is cheap:
% Cell string -> Block string:
C = cell(1, 1000);
C(:) = {'asd'}; % Arbitrary test data
Index = [1, cumsum(cellfun('length', C)) + 1];
Block = cat(2, C{:});
==> Now save Block and Index to the MAT file.
% Back to cell string:
n = numel(index) - 1;
C2 = cell(1, n);
i1 = Index(1);
for k = 1:n
i2 = Index(k + 1);
C2{k} = Block(i1:i2 - 1);
i1 = i2;
end
If you are working with ASCII characters only, use the type UINT8 to save half of the memory.
developping a BlockString type with a set of corresponding functions would be a nice project. For large cell strings the massive reduction of memory usage would accelerate the processing substantially.
  2 comentarios
Paul
Paul el 23 de Nov. de 2012
Thanks for your answer. I understand your code. So I can easily save a big cell arrays with strings with your transformation method and after loading my *.mat data I can then transform it back to the "original" cell array format.
So most of the RAM spaces is due to the headers for the CHAR arrays in the cell? I didn't know that.
Yes, I am working only with ASCII characters. How can I read and save them as UINT8, when I am using the 'fgetl' function for reading my ASCII text. When I use save('filename',...,'-ascii') only the numerical array is saved, but not the cell array 'text_data', which contains the strings.
Jan
Jan el 23 de Nov. de 2012
Do not use the -ASCII format for saving large files. The ASCII format is useful, if a human should be able to inspect the file manually. But no human can read 22MB of numbers. Therefore binary MAT files are much better for your purpose.

Iniciar sesión para comentar.

Más respuestas (1)

Titus Edelhofer
Titus Edelhofer el 22 de Nov. de 2012
Hi,
what does
whos text_data
tell you about the size of text_data? I suspect it's more than 2GB (e.g. 6.5GB?), that's why it fails to be save with formats older than 7.3.
The 7.3 format is uncompressed, therefore it will be much larger than the compressed 7.0 format (for the smaller file).
Titus
  1 comentario
Paul
Paul el 22 de Nov. de 2012
Editada: Paul el 22 de Nov. de 2012
You are right 'text_data' of the 22,2 MB ASCII file has 2339695382 bytes (~2,18 GB),
while 'text_data' of the 14,4 MB ASCII file only has 1,38 GB.
Is there a possibility to save the cell array "text_data" (2,17 GB) to a *.mat file with small size? 6,5 GB is too huge!

Iniciar sesión para comentar.

Categorías

Más información sobre Workspace Variables and MAT-Files en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by