Optimising my data importer for large datasets

2 visualizaciones (últimos 30 días)
HC98
HC98 el 26 de Mzo. de 2023
Editada: Matt J el 26 de Mzo. de 2023
So I have this
txtFiles = dir('*.txt') ; %loads txt files
N = length(txtFiles) ;
Numit = N;
[~, reindex] = sort( str2double( regexp( {txtFiles.name}, '\d+', 'match', 'once' ))); % sorts files
txtFiles = txtFiles(reindex);
for i = 1:N
data = importdata(txtFiles(i).name);
x = data(:,1);
udata(:,i) = data(:,2) ;
end
I have quite a large dataset (well over 200 files) and it takes ages to load things. How can I speed this up? Is there some sort of prepocessing I can do like merge all the files into one or something? I don't know...
  1 comentario
Walter Roberson
Walter Roberson el 26 de Mzo. de 2023
I wonder if using a datastore would be appropriate for your work?

Iniciar sesión para comentar.

Respuestas (1)

Matt J
Matt J el 26 de Mzo. de 2023
Editada: Matt J el 26 de Mzo. de 2023
I don't see any pre-allocation of udata. Also, nothing is being done with x, so it will cut down on time if you don't create it.
udata=cell(1,N);
for i = 1:N
data = importdata(txtFiles(i).name);
%x = data(:,1);
udata{i} = data(:,2) ;
end
udata=cell2mat(udata);
  1 comentario
Matt J
Matt J el 26 de Mzo. de 2023
Editada: Matt J el 26 de Mzo. de 2023
If the data files have many columns, it will also go faster if you read in only the first two columns, maybe using textscan.
udata=cell(1,N);
for i = 1:N
fileID = fopen(txtFiles(i).name);
data = textscan(fileID,'%f %f %*[^\n]');
fclose(fileID);
udata{i} = data(:,2) ;
end
udata=cell2mat(udata);

Iniciar sesión para comentar.

Categorías

Más información sobre Large Files and Big Data en Help Center y File Exchange.

Etiquetas

Productos


Versión

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by