Here you find several ideas - for teaching purposes. You are right, that the loops are not efficient here. Only textscan works at the possible speed:
tic
fid = fopen(file, 'r');
if fid == -1
error('Cannot open file: %s', file);
end
iData = 0;
iHeader = 0;
Header = cell(100, 1);
Data = [];
while ~feof(fid)
s = fgetl(fid);
if strncmp(s, '#', 1)
iHeader = iHeader + 1;
Header{iHeader} = s;
else
iData = iData + 1;
Data(iData, 1:2) = sscanf(s, ' %g %g');
end
end
fclose(fid);
Header = Header(1:iHeader);
toc
Start with this simplified version. strsplit(str2double(cell2mat(...))) wastes some time. On my Matlab R2009a in a virtual machine this uses 33 sec instead of 44 sec of the original (strsplit replaced by regexp('split') in the old Matlab version).
It still suffers from a missing pre-allocation for the data. The iterative growing wastes a lot of resources. Look for "Schlemiel the Painter" in the net.
Another approach:
tic
fid = fopen(file, 'r');
if fid == -1
error('Cannot open file: %s', file);
end
dataC = textscan(fid, '%f %f', 'CommentStyle', '#');
Data = cat(2, dataC{1}, dataC{2});
fclose(fid);
toc
But this does not import the comment lines. Nevertheless, it is quite fast: 0.17 sec. Sounds perfect.
Now try to solve the pre-allocation problem:
tic
fid = fopen(file, 'r');
if fid == -1
error('Cannot open file: %s', file);
end
Block = cell(1, 10000);
dataLen = 1000;
aData = zeros(dataLen, 2);
iBlock = 0;
iData = 0;
iHeader = 0;
Header = cell(100, 1);
while ~feof(fid)
s = fgetl(fid);
if strncmp(s, '#', 1)
iHeader = iHeader + 1;
Header{iHeader} = s;
else
iData = iData + 1;
aData(iData, :) = sscanf(s, ' %g %g');
if iData == dataLen
iBlock = iBlock + 1;
Block{iBlock} = aData;
iData = 0;
end
end
end
fclose(fid);
Header = Header(1:iHeader);
iBlock = iBlock + 1;
Block{iBlock} = aData(1:iData, :);
Data = cat(1, Block{1:iBlock});
toc
Christopher, this looks ugly. Sorry. It is such ugly. Puh. Too much clutter here which is prone to errors. It needs 4.4 sec. 10 times faster, but slow compared to textscan. The speed will even degrade, if the pre-allocated array are too small, while too large arrays costs microseconds only.
I assume that another textscan might be ways nicer:
tic
fid = fopen(file, 'r');
if fid == -1
error('Cannot open file: %s', file);
end
dataC = textscan(fid, ' %f %f ', 'CommentStyle', '#');
Data = cat(2, dataC{1}, dataC{2});
fseek(fid, 0, -1);
HeaderC = textscan(fid, '%s', 'CommentStyle', ' ', 'WhiteSpace', '\n');
Header = HeaderC{1};
fclose(fid);
toc;
Now data and header lines are imported separately. It takes 0.18 sec on my slow computer. I'm not happy with using the space as comment style to ignore the data lines. There might be a better filter:
HeaderC = textscan(fid, '%s', 'WhiteSpace', '\n');
Header = HeaderC{1};
Header = Header(strncmp(Header, '#', 1));
This let the total code run in 0.27 sec.
Conclusion: The loops can be much faster with a pre-allocation, but cannot compete with textscan.