Appending to a very large file

14 visualizaciones (últimos 30 días)
Stefan Oline
Stefan Oline el 5 de En. de 2021
Comentada: Stefan Oline el 8 de Mzo. de 2021
Hi,
I'm having trouble writing very large files to disk. I'm appending 64 smaller files (each ~1 GB) into a sinlge giant matrix. I expect the file to be ~64 GB, and I'm running into an "Out of memory" problem during processing. I'm wondering if there's a more efficient way to do this without needing to load all of the smaller files into memory before writing one monster file to disk. Is there a way for me to load each one at a time and append that to the file, then clear memory and load the next?
Current code looks like this:
clear
close all
clc
% Make a for loop to import every channel
for i=1:64
fprintf('i = %f\n', i);
[Samples, Header] = Nlx2MatCSC(['CSC' num2str(i,'%02.f') '.ncs'],...
[0 0 0 0 1], 1, 1, [] );
%temp_1 = Samples';
temp_2 = reshape(Samples,[],1)';
if exist('signal_mat')
signal_mat = vertcat(signal_mat,temp_2);
else
signal_mat = temp_2;
end
clear Samples Header temp_2
end
clear i
% Demedian the data
fprintf('Demedian data');
signal_med = median(signal_mat);
signal_mat_demed = signal_mat - signal_med;
%% Write to file for KS2
fprintf('Write data');
fid = fopen('myNewFile.dat', 'w');
fwrite(fid,signal_mat, 'int16');
fclose(fid);
fid = fopen('myNewFile_demed.dat', 'w');
fwrite(fid,signal_mat_demed, 'int16');
fclose(fid);
clear
fprintf('Done');

Respuesta aceptada

Jan
Jan el 7 de En. de 2021
This line increases the problem:
signal_mat = vertcat(signal_mat,temp_2);
In e.g. the last step, you concatenate a 63 GB array with a 1 GB array and copy it to a new 64 GB array. This requires 63+64 GB of RAM.
Pre-allocation would avoid this problem. In your case it could work with 64 + X GB RAM, where X might be 8 or 20. But even then this is a huge signal. How much RAM do you have?
  2 comentarios
Stefan Oline
Stefan Oline el 7 de En. de 2021
Preallocating the matrix was a huge help, thanks for pointing that out. I've got 64GB of ram. For the very large files, it sounds like I'll still have to use datastore.
Stefan Oline
Stefan Oline el 8 de Mzo. de 2021
Hello, if I could ask a follow up question, I'm having trouble writing the .dat file since it's so large (~64GB). I attempted to follow the method here:
I'm having trouble doing two things.
  1. Denoising the 64 channels by finding the median across all 64 channels and subtracting that from each signal.
  2. Writing the output (a 64 x 407297536 matrix) to a .dat which will end up being ~64GB.
Is there an easy way to demedian the signals, and then to write them to disk as a giant .dat?
Thanks very much.
%% User inputs
channels = 1:64;
demed_flag = 1;
store_flag = 1;
% Choose a directory to store the files
outDir = 'H:\Falkner_lab\Ephys\2020.07.16_Mouse2357\2020.08.28\tall_eg';
writeDir = 'H:\Falkner_lab\Ephys\2020.07.16_Mouse2357\2020.08.28\tall_eg\write';
%% Setup
n_channels = length(channels);
% Check how many samplesa are in a single channel
[Sample_check, Header] = Nlx2MatCSC(['CSC' num2str(channels(1),'%02.f') '.ncs'],...
[0 0 0 0 1], 1, 1, [] );
temp_a = reshape(Sample_check,[],1)';
n_samples = length(temp_a);
clear Header Sample_check temp_a
%% Import .ncs data files from the channels list to individual .mat files
fprintf('*Importing data*\n')
for i=1:n_channels
disp(['Importing channel ' num2str(i) ' of ' num2str(n_channels) ' (' ...
num2str(i/n_channels*100,2) '%)'])
%fprintf('i = %.0f\n', i )
[Samples, Header] = Nlx2MatCSC(['CSC' num2str(channels(i),'%02.f') '.ncs'],...
[0 0 0 0 1], 1, 1, [] );
data = reshape(Samples,[],1)';
% Choose a file name - ensure these progress in order
fname = fullfile(outDir, sprintf('data_%05d.mat', channels(i)));
% Save the data and increment counters
save(fname, 'data', '-v7.3');
clear Samples Header data fname
end
clear i
fprintf('*Importing data complete*\n');
%% Create a datastore from the files
% Read the data back in as a tall array. First create a datastore ...
fprintf('*Creating a datastore*\n');
ds = fileDatastore(fullfile(outDir, '*.mat'), ...
'ReadFcn', @(fname) getfield(load(fname), 'data'), ...
'UniformRead', true);
fprintf('*Creating a datastore complete*\n');
% ... and then a tall array
fprintf('*Storing the datastore in a tall array*\n');
tdata = tall(ds);
fprintf('*Storing the datastore in a tall array complete*\n');
%% Demedian the signals
% ???
%% Write to file for KS2
if store_flag == 1
fprintf('*Writing data*\n');
fid = fopen('myNewFile_zeros.dat', 'w');
fwrite(fid,tdata, 'int16');
fclose(fid);
fprintf('*Writing data complete*\n');
beep
end

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Tall Arrays en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by