Why does converting a table to a struct increase memory usage by 15x??

14 visualizaciones (últimos 30 días)
Brian Kardon
Brian Kardon el 28 de Oct. de 2019
Comentada: Peter Perkins el 1 de Nov. de 2019
I'm reading tabular data from a file using the "readtable" function - each file has 54 fields and 1000 rows. It takes up 250 kB on disk, and 450 kB in memory as a table. Then, when I try to convert the table to a struct using the "table2struct" function, the resulting struct takes up 6.5 MB!!! Why does converting from a table to a struct result in a 15x increase in memory usage? I have several thousand of these files to manipulate, so 450 kB per file is fine, but 6.5 MB makes MATLAB run out of memory! No good.
Here's some output to verify my assertions:
>> t = readtable('example_file.dat');
Warning: Variable names were modified to make them valid MATLAB identifiers. The original names are saved in the
VariableDescriptions property.
>> t.Properties
ans =
struct with fields:
Description: ''
UserData: []
DimensionNames: {'Row' 'Variables'}
VariableNames: {1×54 cell}
VariableDescriptions: {1×54 cell}
VariableUnits: {}
RowNames: {}
>> size(t)
ans =
1000 54
>> ts = table2struct(t);
>> size(ts)
ans =
1000 1
>> whos t ts
Name Size Bytes Class Attributes
t 1000x54 457776 table
ts 1000x1 6483456 struct
Why does converting from table to struct waste so much memory, and how can I fix it?
Thanks in advance for any help!
PS: For some reason this form won't allow me to select a release - I'm using MATLAB R2017a.
  2 comentarios
Walter Roberson
Walter Roberson el 28 de Oct. de 2019
Table objects have one datatype stored per variable (and more for variables that are cell)
struct have one datatype stored per field per struct array element.
Steven Lord
Steven Lord el 28 de Oct. de 2019
PS: For some reason this form won't allow me to select a release - I'm using MATLAB R2017a.
Select the product first, then the release dropdown should populate.

Iniciar sesión para comentar.

Respuestas (1)

per isakson
per isakson el 28 de Oct. de 2019
Editada: per isakson el 28 de Oct. de 2019
What kind of structure do you expect? table2struct can create two kinds.
  • struct array with one struct for each row of the table.
  • scalar struct with each column of the table stored as one field value
Try
t = readtable('example_file.dat');
struct_scalar = table2struct( t, 'ToScalar', true );
struct_array = table2struct(t);
whos
which returns
Name Size Bytes Class Attributes
struct_array 12x1 15040 struct
struct_scalar 1x1 2720 struct
t 12x10 4068 table
where example_file.dat contains
f00 f01 f02 f03 f04 f05 f06 f07 f08 f09
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
I assume that you expected table2struct(t); to create a scalar structure
  4 comentarios
Brian Kardon
Brian Kardon el 31 de Oct. de 2019
Peter,
Actually, I'm starting to think using tables probably won't cause me headaches.
I'm more famliiar with structs than tables, and I was under the impression that a table in MATLAB was a "higher level" object that would incur more overhead during manipulation and processing, compared to a struct, and I was also under the impression that the set of functions for manipulating structs was more plentiful than the set of functions for tables. Perhaps both of those assumptions are incorrect, and tables will be a good choice for my primary data structure!
Peter Perkins
Peter Perkins el 1 de Nov. de 2019
It all depends on what you are doing.
If you are using a struct array (as opposed to a scalar struct each of whose fields is itself a vector), a table is a clear winner memory-wise. This makes a big difference as your data size gets larger.
Tables allow you to easily slice your data in two directions. A struct array lets you slice along the "array" dimensions, but not so easily along the "fields" dimension. And tables support operations like joins and sorting and unique and others that struct arrays don't. So syntactically, I think you will be happier with tables.
Performance wise, it depends on what you are doing. Subscripted assignment and reference for tables is usually the thing people flag, but those have been getting more performant in the last couple releases (and that will continue). Performance-wise, I think you want to go for ease of use and move away from tables only if you have real performance issues. And even then, it's usually possible to vectorize your code, or to "hoist" a few variables out of the table for a short scope in your code.

Iniciar sesión para comentar.

Categorías

Más información sobre Tables en Help Center y File Exchange.

Productos


Versión

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by