matfile and half inefficient storage
Mostrar comentarios más antiguos
Dear MATLAB users,
I have encountered the following inefficient storage problem:
delete('myfile.mat')
handle = matfile('myfile.mat')
handle.X = half(X); % X is big
handle.Y = half(Y); % Y is big
handle.a = a;
handle.b = b;
%%% the size of myfile.mat is 2.4Gb %%%
data = load('myfile');
save('mynewfile1.mat', '-v7.3', '-struct', 'data')
%%% the size of mynewfile1.mat is 1.2Gb %%%
data = load('myfile');
save('mynewfile2.mat', '-struct', 'data')
%%% the size of mynewfile2.mat is 1.2Gb %%%
What could be causing this doubling of storage and how can I avoid it without loading and resaving the file.
Update: the problem does not seem to be caused by the -v7.3 flag. I updated the code above to show this.
Thank you for your help.
30 comentarios
Image Analyst
el 24 de Jul. de 2021
Why are you saving it in 7.3 (old) format?
Mika
el 24 de Jul. de 2021
dpb
el 24 de Jul. de 2021
-v7,3 is lastest version; -v7 is the default https://www.mathworks.com/help/matlab/ref/save.html set by TMW in the preferences, apparently for compatibility.
There's a note in the doc under the 'version' named parameter that says--
"Version 7.3 MAT-files use an HDF5 based format that requires some overhead storage to describe the contents of the file. For cell arrays, structure arrays, or other containers that can store heterogeneous data types, Version 7.3 MAT-files are sometimes larger than Version 7 MAT-files."
The blowup is something I've noted in some other Q? over last few months -- there was another conversation just the other day it seems where a file was saved also at something like 2X the size w/ -v7.3 flag but the save command w/o the flag was half the size. Turned out it's in the preferences that the -v7 flag is set by default on initial install.
Seems as though this needs some attention from TMW -- the huge blow-up in size indicates something's not kosher/as intended in the implementation.
Quite possible; there's got to be overhead with the matifle object in order to be able to access pieces-parts.
Alternatively, what does half actually do? Does it create some object or what? I don't have any of the TBs that have it so not sure.
Just for checking, what is the settings in Preferences--General-MAT-files? Just so we know for sure what version is used with no explicit flag on the command line.
Mika
el 24 de Jul. de 2021
dpb
el 24 de Jul. de 2021
OK, that the default is -v7 and that both
save('mynewfile1.mat', '-v7.3', '-struct', 'data')
save('mynewfile2.mat', '-struct', 'data')
returned the same size file shows the different file size is not related to the version for whatever data actually is.
Now, what we (at least me, since I can't test) don't know yet is what half actually returns -- the doc above was unclear.
What does
x=half(X);
whos x X
return?
Mika
el 24 de Jul. de 2021
dpb
el 24 de Jul. de 2021
Well, then, it would seem it is the matfile overhead that's the killer -- if you just store X and x, nothing untoward happens, does it?
Mika
el 24 de Jul. de 2021
Walter Roberson
el 25 de Jul. de 2021
But I need matfile to save in a parfor loop.
I have not seen any guarantee that two different processes writing to the same matfile() will not interfere with each other.
The file structure designed for simultaneous access is memmapfile() .
Mika
el 25 de Jul. de 2021
Walter Roberson
el 25 de Jul. de 2021
Please explain more about why using parfor requires you to use matfile? As opposed to just saving (possibly using 7.3 if you have big objects)?
We've eliminated everything on the size conundrum excepting matfile with the exception that haven't seen the explict result of a save statment for the half object (that I can't test). We got so far as to show it didn't show extra memory used via whos but that doesn't prove save didn't need some extra info to go with it. One presumes not, but it hasn't been proven.
If performance is a Q? as I would presume it would be using parfor anyways, the matfile solution may seem "elegant" in minimizing source code, but I think it would still be a sizable time hit even without the the file size issue as compared to the suggested workaround.
Mika
el 26 de Jul. de 2021
I had presumed that would be the result, but since I couldn't/can't test, just for the record... :)
I agree, I think it's well worth bringing to their explicit attention (altho I would presume they're already aware of it) as it appears they may need to re-examine just what is causing such a huge blowup and rethink what they're doing going forward.
While they probably won't classify it as a bug since it seems to still work to provide the documented functionality, certainly from a performance and quality of implementation POV it deserves to be flagged.
Walter Roberson
el 26 de Jul. de 2021
Using a small auxillary function to do the save() is what is recommended.
dpb
el 26 de Jul. de 2021
That avoids it, but doesn't resolve that storage requirements blow up remarkably with matfile which seems to me at least to be a problem even if one can get around it in some instances by not using it. If never going to use it, isn't much point in having it in the language... :)
James Tursa
el 27 de Jul. de 2021
For the record, half data types are stored as opaque classdef objects. They are fundamentally different from the other native numeric types such as double and single. Whether this has anything to do with the behavior I don't know.
Walter Roberson
el 27 de Jul. de 2021
Good point, James. The representation of classdef objects can end up being quite different in HDF5 .
dpb
el 27 de Jul. de 2021
But the testing didn't show any difference w/ save of the raw type; only w/|matfile...
Eike Blechschmidt
el 28 de Jul. de 2021
You could do the following and see if there is a difference in how the files are stored as hdf5 files:
h5disp('myfile.mat');
h5disp('mynewfile1.mat');
Mika
el 29 de Jul. de 2021
Q490
el 16 de Ag. de 2021
As a side note, and not sure if this is directly related to an answer to your question, a function I've found very useful that can be a good substitute for using matfile is "savefast", written by Tim Holy (https://www.mathworks.com/matlabcentral/profile/authors/1337381) and which can be downloaded at:
For the file sizes you are talking about it saves it extremely quickly and in the smallest possible file size. I highly recommend it.
Mika
el 18 de Ag. de 2021
Pavithra Jayachandran
el 21 de Ag. de 2021
Thank you I will try
xingxingcui
el 24 de Ag. de 2021
Editada: xingxingcui
el 24 de Ag. de 2021
Similar questions here ,TMW should provide an effective solution.
S Priyadharshini
el 30 de Ag. de 2021
Myfile and mynewfile2.mat
Respuestas (0)
Categorías
Más información sobre Workspace Variables and MAT Files en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!