How to parse poorly formatted txt file?

2 visualizaciones (últimos 30 días)

Mostrar comentarios más antiguos

Jess el 29 de Mzo. de 2017

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/332640-how-to-parse-poorly-formatted-txt-file

Comentada: Jess el 4 de Abr. de 2017

Respuesta aceptada: Walter Roberson

ArrayData.txt

Hello,

I'm working with modeling software that outputs data in badly formatted .txt files, there is a screenshot of the output below. Each output file contains 10,000 data blocks which begin with the highlighted "1tally" string, and end several lines later with a decimal number.

Ideally I need to be able to pull the 1tally string, the floating number following it (14 and 24 in the picture), and the remaining two values in each data set (6.47566E-07 0.0187 and 6.93514E-07 0.0181). I've tried using textscan options to locate the 1tally string but I'm not familiar enough with matlab to write the loop to keep seeking the remaining 9,999 data blocks. I can't use the 'HeaderLines' option because the entries are not on the same row in every file, and even in a single file the number of rows between data blocks will vary anywhere between 1 and 500.

Any help or advice would be greatly appreciated.

Edit: I can't post the full output file, but I've attached a shortened version. The formatting is the same as what I need to the code for. The only difference would be the number of rows between the start of the file and the first occurrence of 1tally

2 comentarios
Mostrar NingunoOcultar Ninguno

Stephen23 el 30 de Mzo. de 2017

@Jess: can you please edit your question and upload a sample file by clicking the paperclip button above the textbox. Then we can test code on a real file.

Jess el 30 de Mzo. de 2017

@Stephen Cobeldick: thanks for the advice, I've added an example of the file.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Respuesta aceptada

Walter Roberson el 30 de Mzo. de 2017

1
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/332640-how-to-parse-poorly-formatted-txt-file#answer_260947

Abrir en MATLAB Online

I would probably use fileread() to read the entire file into a string, and then I would probably use regexp() with named tokens. It might not be bad... something like

regexp(S, '(?<=1tally\s+)(?<tallyno>\d+)(?:.*?)(?<last2>\S+)(?:\s+)(?<last1>\S+)(?=\s+=)', 'names')

This looks for 1tally followed by whitespace, then puts the decimal digits that follow that into the field 'tallyno'. Then it skips as few characters as possible to satisfy what comes after. Then it captures a bunch of non-whitespace items into a field named 'last2', after which it skips whitespace and then captures a bunch of non-whitespace items into a field named 'last1'. After that it skips whitespace, and after that it is mandatory that there is an "="

I would need an extract of the file to test the expression to be certain.

10 comentarios
Mostrar 8 comentarios más antiguosOcultar 8 comentarios más antiguos

Walter Roberson el 4 de Abr. de 2017

Editada: Walter Roberson el 4 de Abr. de 2017

Abrir en MATLAB Online

Try creating a file parsemcn.awk with content

/^1tally.*[0-9]/ {print $2};
/^ cell/ {getline; print $1; print $2}

Then

gawk -f parsemcn.awk ArrayData.txt > SomeOutputFile.txt

Jess el 4 de Abr. de 2017

That did the trick! You have no idea just how much time you've saved me with your help and advice!

Iniciar sesión para comentar.

Más respuestas (0)

Iniciar sesión para responder a esta pregunta.

Categorías

MATLAB Language Fundamentals Data Types Characters and Strings String Parsing

Más información sobre String Parsing en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by