Using memory allocation to split table

1 visualización (últimos 30 días)
hal9k
hal9k el 6 de En. de 2021
Comentada: hal9k el 7 de En. de 2021
Below is an example of splitting tables based on chunkSize and saving them as individual .mat files. The chunksize is number of rows defined.
How do I use the same exact concept but instead of using number of rows as a basis, I want to use memory size as unit to split.
(I do not have control over data creation so I cannot go to source and split it there).
%%%% Example code
load patients
%% loading vars
patients = table(LastName,Gender,Age,Height,Weight,Smoker,Systolic,Diastolic);
chunkSize = 28; % chunk size from number of rows
noOfChunks = ceil(size(patients,1) / chunkSize);
%% To Output chunks
for idx = 1:noOfChunks
if idx == noOfChunks
data = patients(1:end,:);
patients(1:end,:) = [];
else
data = patients(1:chunkSize,:);
patients(1:chunkSize,:) = [];
end
% Save data
savefile = strcat('data',num2str(idx));
save(savefile, 'data');
end
  2 comentarios
Walter Roberson
Walter Roberson el 6 de En. de 2021
patients(1:chunkSize,:) = []; % delete rows to save memory
That does not save memory.
You already have the entire patients table in memory, so you do not need to save memory in order to make room to add more entries.
So what is happening instead is that in order to do the deletions, MATLAB is having to take a copy of the patients table without the indicated rows, and then replace the patients table with the new version and release the old version. This requires tempory copies of the table, repeatedly, for no intermediate benefit other than making the code marginally easier (because you can use fixed indices.)
Perhaps it is worthwhile clearing the entire patient table after you have saved chunks of it, but not otherwise -- not unless you were also growing the table at the same time through some process.
hal9k
hal9k el 6 de En. de 2021
That makes sense. Thanks for explanation.

Iniciar sesión para comentar.

Respuesta aceptada

Walter Roberson
Walter Roberson el 6 de En. de 2021
Is LastName a fixed width character array, or is it a cell array of character vectors or is it a string array?
If it is not a fixed width character array, then you cannot predict the memory requirements of each row, and have to query the data to find out the memory requirements. It can be done: probably the easiest way would be to
name_bytes = cellfun(@length, patients.LastName)*2 + 104
The 104 is the basic size need per cell array entry, to which you have to add the number of bytes occupied by the characters, at 2 bytes per character position.
Gender,Age,Height,Weight,Smoker,Systolic,Diastolic
Those look to me to be fixed number of bytes per entry -- though it would depend on how the Systolic and Diastolic are recorded.
It looks to me as if the basic size of a table is 768 bytes, plus 210 bytes per variable.
  2 comentarios
hal9k
hal9k el 6 de En. de 2021
LastName is cell array of character arrays.
hal9k
hal9k el 7 de En. de 2021
Pretty sure not the most elegant way to do this but I sort of used memory info from whos for row-wise operation. Doesnt chunk in exact memory size but hits the ballpark since data in rows are similar.
load patients
%% loading vars
patients = table(LastName,Gender,Age,Height,Weight,Smoker,Systolic,Diastolic);
%% Output chunks based on memory size
chunkMemorySize = 10^4; % Specify memory size in bytes
%% Test rows to find row Size for mem chunking
flag = 0;
chunkRowSize = 1;
while flag == 0 && chunkRowSize <= size(patients,1)
tableChunk = patients(1:chunkRowSize,:);
memS = whos('tableChunk');
if memS.bytes < chunkMemorySize
chunkRowSize = chunkRowSize + 1;
else
flag = 1;
end
end
chunkRowSize = chunkRowSize - 1; %final chunkRowSize value in loop led it to exceed mem limit
%% Output chunks based on row count
noOfChunks_R = ceil(size(patients,1) / chunkRowSize);
rowStart = 1;
for idx = 1:noOfChunks_R
rowEnd = rowStart + chunkRowSize -1;
if idx == noOfChunks_R
data = patients(rowStart:end,:);
else
data = patients(rowStart:rowEnd,:);
end
rowStart = rowEnd + 1;
savefile = strcat('data',num2str(idx));
save(savefile, 'data');
end

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Data Distribution Plots en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by