textscan failing to read data in text file

Question

UniqueWorldline el 15 de Oct. de 2017

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/361377-textscan-failing-to-read-data-in-text-file

Comentada: Cedric el 21 de Oct. de 2017

Respuesta aceptada: Cedric

Abrir en MATLAB Online

I have a text file with a fileID called fidRawData that contains rows that look like this:

A BCD 99.9 9.90 9.999 99.9 0.999 0.99 9.999 9.999 99.99 99.9 9.9

A can be one of two characters ('A' or 'B'), or it can be empty (a space is inserted in its place, leaving white space at the beginning of the row). The status of this first character can vary by row. BCD is a three letter code than can vary depending on the row. The subsequent columns of numbers I want to consider as being as general as possible, but none of them will ever get large. They should all be between -9999 and 9999.

Sometimes an error occurs and

---

is inserted in place of some of the numbers in a given row like this:

A BCD 99.9 9.90 9.999 --- --- 0.99 9.999 9.999 99.99 99.9 9.9

The only thing I can really be sure of is that there will always be one space between the columns. There may be more than one space. The numbers can vary depending on if they are positive or negative, where the decimal point is, and how large or small they are.

I need to use either textscan or fscanf (I would prefer to use textscan for its greater flexibility) to store all the data in each of these columns (including the textual information in the first two columns) in whatever data type will accept such a diverse range of simpler data types and allow me to easily retrieve the data.

Whenever and 'A' is omitted, and a ' ' is put in its place, I am ok with an 'N' or other character taking its place if need be, but if there is an 'A' or a 'B', I want that stored as 'A' or 'B' respectively.

When an '---' shows up, I want to replace that with NAN, an empty location in the data structure, or some other indication that there is no data available.

I tried the following command on a singular row where there was an 'A' at the beginning of the row and no '---' were in the row:

rawData = textscan(fidRawData, '%s %s %f %f %f %f %f %f %f %f %f %f)

This command worked as expected. It returned a 1x14 cell array where all the values in the text file were stored as I wanted in rawData.

But there are plenty of rows without and 'A' or 'B' and '---' is present at least once in the row. In order to try and address these variations, I tried the following on a row where both conditions are true:

rawData = textscan(fidRawData, '%s %s %f %f %f %f %f %f %f %f %f %f %f %f,'Delimiter',' ','EmptyValue',0)

This test results in a 1x14 cell array that is completely empty. The cells are either 1x1 cell type cells and contain a 0x0 char array, or they are 0x1 double cells.

rawData = textscan(fidRawData, '%s %s %f %f %f %f %f %f %f %f %f %f)

worked up until it hit the '---' in the row, then began returning 0x1 double cells for the remaining columns of rawData.

What can I do to get textscan to deal with these possibilities?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Cedric el 15 de Oct. de 2017

2
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/361377-textscan-failing-to-read-data-in-text-file#answer_285825

Editada: Cedric el 15 de Oct. de 2017

Abrir en MATLAB Online

data.txt

Here is one way. We pre-process the content before parsing, adding 'N' where the first letter is missing. Then we count the number of columns, split the content on white spaces, and reshape the output according to the number of columns. Finally we extract the header (or those first two char columns) and convert the rest to double.

content = fileread( 'data.txt' ) ;
content = regexprep( content, '^\s', 'N ', 'lineanchors' ) ;
nCols   = numel( strsplit( regexp( content, '[^\r\n]+', 'match', 'once' ), ' ')) ;
data    = reshape( regexp(content, '\s+', 'split'), nCols, [] ).' ;
header  = data(:,1:2) ;
data    = str2double( data(:,3:end) ) ;

Applied to the file attached, we get:

 >> header
 header =
  5×2 cell array
    {'A'}    {'BCD'}
    {'B'}    {'BCD'}
    {'N'}    {'BCD'}
    {'B'}    {'BCD'}
    {'N'}    {'BCD'}
 >> data
 data =
   99.9000    9.9000    9.9990       NaN       NaN    0.9900    9.9990    9.9990   99.9900   99.9000    9.9000
   99.9000    9.9000    9.9990       NaN       NaN    0.9900    9.9990    9.9990   99.9900   99.9000    9.9000
   99.9000    9.9000    9.9990   99.9000    0.9990    0.9900    9.9990    9.9990   99.9900   99.9000    9.9000
   99.9000    9.9000    9.9990       NaN       NaN    0.9900    9.9990    9.9990   99.9900   99.9000    9.9000
   99.9000    9.9000    9.9990       NaN       NaN    0.9900    9.9990    9.9990   99.9900   99.9000    9.9000

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

UniqueWorldline el 21 de Oct. de 2017

Thank you very much @Cedric Wannaz. I may have some follow up questions that I will ask in a new thread that references this question in a link, but your code has solved 99% of my problems analyzing this data.

Cedric el 21 de Oct. de 2017

My pleasure!

Iniciar sesión para comentar.

Answer 2

Walter Roberson el 17 de Oct. de 2017

1
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/361377-textscan-failing-to-read-data-in-text-file#answer_286302

Abrir en MATLAB Online

In the case where you already know the number of numeric columns (perhaps having parsed the file the way Cedric shows), then there is a trick you can use:

S = 'A BCD   99.9   9.90 9.999 99.9 0.999  0.99  9.999  9.999  99.99 99.9  9.9';  %sample input
S1 = '  BCD   99.9   9.90 9.999 99.9 ---  ---  9.999  9.999  --- 99.9  9.9';  %another sample input. Leading space is important
NumNumeric = 11;
SP = '%*[ ]';
fmt = ['%c', SP, '%s', repmat([SP '%f'], 1, NumNumeric)];
textscan(S, fmt, 'treatasempty', '---', 'whitespace','')
textscan(S1, fmt, 'treatasempty', '---', 'whitespace','')

These give

ans =
  1×13 cell array
    {'A'}    {'BCD'}    {[99.9]}    {[9.9]}    {[9.999]}    {[99.9]}    {[NaN]}    {[NaN]}    {[9.999]}    {[9.999]}    {[NaN]}    {[99.9]}    {[9.9]}
ans =
  1×13 cell array
    {' '}    {'BCD'}    {[99.9]}    {[9.9]}    {[9.999]}    {[99.9]}    {[NaN]}    {[NaN]}    {[9.999]}    {[9.999]}    {[NaN]}    {[99.9]}    {[9.9]}

This approach does not require pre-processing to replace missing leading character.

I show here scanning from a string; you can fopen() the file and pass the file identifier where I show the string.

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Cedric el 21 de Oct. de 2017

Editada: Cedric el 21 de Oct. de 2017

Neat, I had forgotten about it!

Iniciar sesión para comentar.

textscan failing to read data in text file

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Más respuestas (1)

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

textscan failing to read data in text file

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

5 comentarios Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Más respuestas (1)

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos