Mat file created with matfile() is slow and creates large files for no obvious reason

6 visualizaciones (últimos 30 días)
I'm confused about the purpose of the matfile() function. It seems like it might be a way to append to mat files during execution, but using it this way is extremely slow and creates much larger files than using save().
For example, take the following code snippet:
m = matfile('foo.mat','Writable',true);
m.A=zeros(3,1);
for i=1:100000 % this loop takes about 5 minutes to run on my PC
m.A(:,i)=rand(3,1);
end
load('foo.mat'); % foo.mat is 25 MB??
save('bar.mat','A') % bar.mat is 2 MB, but has identical contents to foo.mat??
This creates two mat files with identical contents, one of which is 25 MB and the other of which is 2 MB. Writing to the mat file in the loop using the matfile handle also takes a huge amount of time. Could anyone explain what's going on here?

Respuestas (1)

Simar
Simar el 19 de Jun. de 2024
The matfile function in MATLAB provides a way to access and modify variables in MAT-files without loading the entire file into memory. This is useful when working with large datasets that do not fit into memory. However, the way “matfile” handles data can lead to inefficiencies in certain scenarios, as discovered.
Using matfile for incremental writes is slower and produces larger files due to frequent disk access, potential space reallocation, and added file overhead. In contrast, the save function compresses data efficiently in a single operation, resulting in faster performance and smaller file sizes.
In shared example, writing to foo.mat in a loop, modifying the file 100,000 times. This process is slow because each write operation incurs file I/O overhead. Additionally, the resulting file is larger because it contains more overhead and less effective compression due to the incremental writes.
When you save A to bar.Mat using the “save” function, MATLAB writes the entire array to disk in one operation, which allows it to efficiently compress the data and minimize file overhead, resulting in a faster operation and a smaller file.
Recommendations:
  • matfile is best used when dealing with data that is too large to fit into memory. For smaller datasets, or when performance is a concern, consider constructing variable in memory first and then saving it in one operation.
  • Preallocate Space: If final size of variable is known, reallocating space in MAT-file can sometimes improve performance and reduce file fragmentation, although this will not necessarily reduce final file size.
While matfile provides a flexible interface for working with large datasets, its performance characteristics and impact on file size make it less suitable for scenarios where data can be efficiently handled in memory.
Please refer to following documentation links-
Hope it helps!
Best Regards,
Simar

Categorías

Más información sobre Workspace Variables and MAT-Files en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by