Limit to Textscan?

Question

0 votos

Hi all, I have been importing multiple data files (typically hundreds of files) quite successfully to Matlab using the textscan function.

Recently, my raw file format has changed (due to different data acquisition setup). Previously, I had one time column, and 20 data columns, and all columns were of the same length. But now, each data column has it's own time column (which do not line up with the other data), and the length of each data column is different from one another. I've made additions to my script so that it also reads in all the corresponding times for each data column, but I've discovered now for some reason, it doesn't read the whole file. It will read the file until about row 123, even though some columns go up to row 247, and some go up to 641. So I'm just curious if this is a limitation of the textscan function, or if the new code I added is funky.

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Oleg Komarov el 9 de Mayo de 2012

Next time do not create additional answers since it became impossible to follow who's answered what and to collect all the info you supplied. Please use comments or/and edit your original answer.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

Geoff el 9 de Mayo de 2012

Abrir en MATLAB Online

0 votos

Thanks for clarifying what your data looks like.

I assume that comma immediately after the '4' is a mistake. You could probably do this with a regexp... Because each comma denotes a pair of values. I take it that if the value before the comma is missing then the value after is also missing.

Do you have a fixed number of columns? If so, are the commas always there?

If at least the second condition above is true, then this isn't so bad... You can read pairs of values using regexp:

lines = {'1, 2  3, 4  5,  6'
         '1, 2  3, 4  5,  6'
         '1, 2  3, 4   ,  '
         ' ,    3, 4  , '};
toks = regexp(lines, '\s*(\w*)\s*,\s*(\w*)', 'tokens');

This extracts word-like strings with optional spaces and the obligatory comma.

What you end up with is one cell per row, and within that one cell per pairing. You can manipulate this data as you see fit, convert empty strings or non-numbers to NaN, etc...

I dunno, that's the kind of solution I come up with when I don't want to spend too much time thinking up more complicated clever stuff.

[EDIT]

The above regexp fails on the fourth line because there's no logic that says if you have the first value you must have the second (and vice versa)... So try this:

toks = regexp(lines, '\s*(\w+)\s*,\s*(\w+)|\s*()\s*,\s*()', 'tokens');
rows = cell(size(toks));
for r = 1:numel(toks)
  rows(r) = { str2double([toks{r}{:}]) };
end

Now you have a cell with one row per line, containing a vector of doubles...

This won't work with other rubbish in your data like % signs, but you can either filter that or allow for it in the regular expression....

And if course if you know that all your rows are the same length (or force them to be after processing), you can convert the whole rows array to a matrix with cell2mat

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Answer 2

Geoff el 8 de Mayo de 2012

0 votos

I doubt there is a limit for the tiny numbers you're talking about.

What I expect has happened is that textread encountered some text that did not fit the format and was not listed as a possible delimiter.

Check your data file near the last line that you think was successfully parsed.

3 comentarios
Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

Geoff el 9 de Mayo de 2012

From your descriptions it's hard to envisage what your data looks like, and you haven't shown your textread() call. If you want your data in a matrix, then it has to be the width/height of your largest column and row number. If you want a variable width, you'll need to read into a cell array. I'd recommend using fgetl() with textread() on a per-line basis... Other functions worth checking out are sscanf(), regexp() or textscan().

Walter Roberson el 9 de Mayo de 2012

textread() is not recommended; it will be removed from MATLAB.

textscan() is its replacement.

Iniciar sesión para comentar.

Answer 3

Walter Roberson el 9 de Mayo de 2012

0 votos

MATLAB does not provide any facilities that can deal with reading field-wise from blocks of text of inconsistent number of fields. Not unless all of the fields are the same numeric format and everything is be read as one continuous stream ignoring line boundaries.

To read row-wise with inconsistent number of fields, one must read entire lines and parse them afterwards.

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Answer 4

Ken Atwell el 9 de Mayo de 2012

Abrir en MATLAB Online

0 votos

That is an unusual file format. If I read you correctly, you have a file I would describe as "ragged down"... a consistent number of columns, but the number of rows per column is variable. Is that right? I'm assuming the columns are delimited with commas, tabs or such; something like (whitespace added):

 11, 12, 13
 21,   , 23
 ,   , 33

In this trivial example, textscan would stop processing at the first missing value (in the second row here). You can call textscan again with the same file handle and it will continue where it left off, but I image you will find it difficult to recover from the missing value.

Depending on the version of MATLAB you are using, I would try importing the file into MATLAB... it may just do right thing, and you can then generate a script from there to create a programmatic solution.

If that doesn't work out, another solution would be to read the file line-by-line, splitting on the delimiter (comma here). And, in this case, I want to convert from strings to doubles. Here is some code to import the data I've included here:

 f = fopen('input.dat');
 A=[];
 while ~feof(f)
    l = fgetl(f);
    r = regexp(l, ',', 'split');
    A(end+1,:) = str2double(r);
 end
 fclose (f);
 A

Missing values are represented by NaNs in A.

3 comentarios
Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

iffi el 27 de Dic. de 2012

Abrir en MATLAB Online

 f = fopen('input.dat');
 A=[];
 while ~feof(f)
    l = fgetl(f);
    r = regexp(l, ',', 'split');
    A(end+1,:) = str2double(r);
 end
 fclose (f);
 A this code read the file well but I have also some data in this form e.g V567,V1528,.. 
here this code also give me NaN for all such entries apart from missing values.

Walter Roberson el 27 de Dic. de 2012

It appears you are starting a new topic. Please create a new Question for this. You can refer to this existing topic as giving ideas.

Iniciar sesión para comentar.

Answer 5

Ying el 9 de Mayo de 2012

Abrir en MATLAB Online

0 votos

Thanks for the responses, Ken, as for trying to import the file into Matlab, I could not do it successfully as I have multiple delimiters in my data. The data looks like the following:

 1, 2  3, 4  5,  6
 1, 2  3, 4  5,  6
 1, 2  3, 4   ,  
  ,    3, 4,  ,

The data is weird in that commas separate the time and data column for one variable, and a space separates it from the next set of time/data. So in this example columns of 1, 3, and 5 are times, and 2,4,6 are the respective data that the times correspond to. And each set ends at different times. Right now my textscan always end at the shortest set (5,6) in this example. Is it possible to just change my delimiters so that it reads the whole file? Or should I try the line by line read option?

2 comentarios
Mostrar Ninguno Ocultar Ninguno

Walter Roberson el 9 de Mayo de 2012

Are the columns fixed width? If they are not, there is logical difficulty in distinguishing between " 3" and "3 ".

Ying el 9 de Mayo de 2012

I don't know, I do know that it reads everything fine up to the shortest column though

Iniciar sesión para comentar.

Answer 6

Ying el 9 de Mayo de 2012

Abrir en MATLAB Online

0 votos

That's correct Geoff, the comma after the 4 is a typo. The number of columns is somewhat fixed. What I mean is it's controllable, I can choose how many variables to track, however if I want more or less variables then I have to change the script to match that as well. The commas are always there, between the time and data that it matches to.

Oh, and since you asked earlier, this is my textscan line:

 datanew = textscan(fid,'%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f','Delimiter','\t%,','HeaderLines',2);

So as you can see I have around 52 columns, not the most pretty or ideal way to do it, I know. I wanted to use import, but textscan seems to be the only way I've gotten it to work.

5 comentarios
Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

Ying el 9 de Mayo de 2012

How would I account for header lines and column names?

Geoff el 10 de Mayo de 2012

Read the first line and process it the same way. Are the headers separated by the same "comma-sometimes" strategy? You could use the same regexp code I gave you as long as a single header does not contain a space.

Iniciar sesión para comentar.

Answer 7

per isakson el 9 de Mayo de 2012

Abrir en MATLAB Online

0 votos

Does the data block of the file have a format something like this?

    time_stamp, value space  time_stamp, value space  time_stamp, value  
    time_stamp, value space  time_stamp, value space  time_stamp, value  
    time_stamp, value space  time_stamp, value space  time_stamp, value  
    time_stamp, value space            ,              time_stamp, value

"space" is that only char(32)? There isn't a tab, char(9)? The "time_stamp" does it have a special format that can be distinguished from "value"? Do the columns have fixed width, as in my example above?

If you how many header lines you can read them with fgetl or textscan.

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Answer 8

Ying el 9 de Mayo de 2012

0 votos

I think I was able to make it work by reading in all values as strings instead of floating numbers, and then making them all the same length, and use a str2num and converted the strings back to numbers. Now I just have to get it to work with the rest of the script.

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Geoff el 10 de Mayo de 2012

Use str2double()

Iniciar sesión para comentar.

Limit to Textscan?

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Respuesta aceptada

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Más respuestas (7)

3 comentarios
Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

3 comentarios
Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

2 comentarios
Mostrar Ninguno Ocultar Ninguno

5 comentarios
Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Categorías

Etiquetas

Community Treasure Hunt

Limit to Textscan?

1 comentario Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Respuesta aceptada

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Más respuestas (7)

3 comentarios Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

3 comentarios Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

2 comentarios Mostrar Ninguno Ocultar Ninguno

5 comentarios Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

1 comentario Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Categorías

Etiquetas

Ver también

Community Treasure Hunt

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

3 comentarios
Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

3 comentarios
Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

2 comentarios
Mostrar Ninguno Ocultar Ninguno

5 comentarios
Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos