Limit to Textscan?
5 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Hi all, I have been importing multiple data files (typically hundreds of files) quite successfully to Matlab using the textscan function.
Recently, my raw file format has changed (due to different data acquisition setup). Previously, I had one time column, and 20 data columns, and all columns were of the same length. But now, each data column has it's own time column (which do not line up with the other data), and the length of each data column is different from one another. I've made additions to my script so that it also reads in all the corresponding times for each data column, but I've discovered now for some reason, it doesn't read the whole file. It will read the file until about row 123, even though some columns go up to row 247, and some go up to 641. So I'm just curious if this is a limitation of the textscan function, or if the new code I added is funky.
1 comentario
Oleg Komarov
el 9 de Mayo de 2012
Next time do not create additional answers since it became impossible to follow who's answered what and to collect all the info you supplied. Please use comments or/and edit your original answer.
Respuesta aceptada
Geoff
el 9 de Mayo de 2012
Thanks for clarifying what your data looks like.
I assume that comma immediately after the '4' is a mistake. You could probably do this with a regexp... Because each comma denotes a pair of values. I take it that if the value before the comma is missing then the value after is also missing.
Do you have a fixed number of columns? If so, are the commas always there?
If at least the second condition above is true, then this isn't so bad... You can read pairs of values using regexp:
lines = {'1, 2 3, 4 5, 6'
'1, 2 3, 4 5, 6'
'1, 2 3, 4 , '
' , 3, 4 , '};
toks = regexp(lines, '\s*(\w*)\s*,\s*(\w*)', 'tokens');
This extracts word-like strings with optional spaces and the obligatory comma.
What you end up with is one cell per row, and within that one cell per pairing. You can manipulate this data as you see fit, convert empty strings or non-numbers to NaN, etc...
I dunno, that's the kind of solution I come up with when I don't want to spend too much time thinking up more complicated clever stuff.
[EDIT]
The above regexp fails on the fourth line because there's no logic that says if you have the first value you must have the second (and vice versa)... So try this:
toks = regexp(lines, '\s*(\w+)\s*,\s*(\w+)|\s*()\s*,\s*()', 'tokens');
rows = cell(size(toks));
for r = 1:numel(toks)
rows(r) = { str2double([toks{r}{:}]) };
end
Now you have a cell with one row per line, containing a vector of doubles...
This won't work with other rubbish in your data like % signs, but you can either filter that or allow for it in the regular expression....
And if course if you know that all your rows are the same length (or force them to be after processing), you can convert the whole rows array to a matrix with cell2mat
0 comentarios
Más respuestas (7)
Geoff
el 8 de Mayo de 2012
I doubt there is a limit for the tiny numbers you're talking about.
What I expect has happened is that textread encountered some text that did not fit the format and was not listed as a possible delimiter.
Check your data file near the last line that you think was successfully parsed.
3 comentarios
Geoff
el 9 de Mayo de 2012
From your descriptions it's hard to envisage what your data looks like, and you haven't shown your textread() call. If you want your data in a matrix, then it has to be the width/height of your largest column and row number. If you want a variable width, you'll need to read into a cell array. I'd recommend using fgetl() with textread() on a per-line basis... Other functions worth checking out are sscanf(), regexp() or textscan().
Walter Roberson
el 9 de Mayo de 2012
textread() is not recommended; it will be removed from MATLAB.
textscan() is its replacement.
Walter Roberson
el 9 de Mayo de 2012
MATLAB does not provide any facilities that can deal with reading field-wise from blocks of text of inconsistent number of fields. Not unless all of the fields are the same numeric format and everything is be read as one continuous stream ignoring line boundaries.
To read row-wise with inconsistent number of fields, one must read entire lines and parse them afterwards.
0 comentarios
Ken Atwell
el 9 de Mayo de 2012
That is an unusual file format. If I read you correctly, you have a file I would describe as "ragged down"... a consistent number of columns, but the number of rows per column is variable. Is that right? I'm assuming the columns are delimited with commas, tabs or such; something like (whitespace added):
11, 12, 13
21, , 23
, , 33
In this trivial example, textscan would stop processing at the first missing value (in the second row here). You can call textscan again with the same file handle and it will continue where it left off, but I image you will find it difficult to recover from the missing value.
Depending on the version of MATLAB you are using, I would try importing the file into MATLAB... it may just do right thing, and you can then generate a script from there to create a programmatic solution.
If that doesn't work out, another solution would be to read the file line-by-line, splitting on the delimiter (comma here). And, in this case, I want to convert from strings to doubles. Here is some code to import the data I've included here:
f = fopen('input.dat');
A=[];
while ~feof(f)
l = fgetl(f);
r = regexp(l, ',', 'split');
A(end+1,:) = str2double(r);
end
fclose (f);
A
Missing values are represented by NaNs in A.
3 comentarios
iffi
el 27 de Dic. de 2012
f = fopen('input.dat');
A=[];
while ~feof(f)
l = fgetl(f);
r = regexp(l, ',', 'split');
A(end+1,:) = str2double(r);
end
fclose (f);
A this code read the file well but I have also some data in this form e.g V567,V1528,..
here this code also give me NaN for all such entries apart from missing values.
Walter Roberson
el 27 de Dic. de 2012
It appears you are starting a new topic. Please create a new Question for this. You can refer to this existing topic as giving ideas.
Ying
el 9 de Mayo de 2012
2 comentarios
Walter Roberson
el 9 de Mayo de 2012
Are the columns fixed width? If they are not, there is logical difficulty in distinguishing between " 3" and "3 ".
per isakson
el 9 de Mayo de 2012
Does the data block of the file have a format something like this?
time_stamp, value space time_stamp, value space time_stamp, value
time_stamp, value space time_stamp, value space time_stamp, value
time_stamp, value space time_stamp, value space time_stamp, value
time_stamp, value space , time_stamp, value
"space" is that only char(32)? There isn't a tab, char(9)? The "time_stamp" does it have a special format that can be distinguished from "value"? Do the columns have fixed width, as in my example above?
If you how many header lines you can read them with fgetl or textscan.
0 comentarios
Ver también
Categorías
Más información sobre Text Files en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!