read inconsistent ascii file to matrix
Mostrar comentarios más antiguos
I'd like to obtain maximum performance in reading a file containing both, numeric and non-numeric lines. The files typically look as such:
% comment
text 1.49
1.52 -5.3 8.9710
3.629 -5.77 9
another text and numbers
% comment again
1 2 3
and so on
The file can easily contain 1 million lines.
I would like to obtain two cell arrays:
- One that contains all rows that match %f %f %f , i.e. a numeric triplet. Already parsed as numeric doubles. Invalid lines should show up as empty entries or NaN.
- Another matrix, that contains all rows that did not match cell-array 1. Still as cellstr, prefereably with trimmed whitespaces.
Obtaining matrix 2 is sort of simple if you already have 1: simply by issuing textscan, and setting all rows that did not match 1 as empty. However, I struggle in obtaining cell array #1. textscan will stop reading once it encounters invalid lines.
In a working example I used sscanf and parsed everything line-by-line. This took about 15s for 1 million lines. Since textscan can read the whole file in less than a second, I am confident that there is room for improvement...
4 comentarios
Jan
el 27 de Mzo. de 2019
Please post the code of your working example. Maybe there is an obvious point to improve the perfomance, perhaps a pre-allocation.
Tom DeLonge
el 1 de Abr. de 2019
Tom DeLonge
el 9 de Abr. de 2019
Respuesta aceptada
Más respuestas (1)
Unfortunately, there's no ignore invalid lines for textscan, so you're going to have to parse the file line by line, or implement the parsing in mex.
The following takes about 10s on my machine for a million lines. It's probably similar to what you've done already:
function [num, text] = parsefile(path)
lines = strsplit(fileread(path), '\n');
num = cellfun(@(l) sscanf(l, '%f %f %f')', lines, 'UniformOutput', false);
text = lines(cellfun(@isempty, num)); %could use cellfun('isempty', num) for a marginal speed gain
end
1 comentario
Tom DeLonge
el 27 de Mzo. de 2019
Categorías
Más información sobre Characters and Strings en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!