Import Irregular and nonpaterned text data into a matrix form

Question

0 votos

I am sure this is covered elsewhere but I have been unable to find it. I am interested in creating a parser that will extract data from a text file and put it into a matrix where it will be viewable in Excel. A sample of my data is below:

Failure Report

Time: 00:12:34

Fault ID: Converter fail

Fault Description: Channel 4 of the converter has failed during tuning

\n

Failure Report

Time: 00:12:37

Fault ID: Comparator 4 Fail

Fault Description: comparator 4 has failed

\n

Failure Report

Time: 00:12:39

Fault ID: Converter fail in practice

Fault Description: Channel 4 of the converter has failed when in mode 5

\n

Failure Report

Time: 00:12:45

Fault ID: Converter 12 Fail

Fault Description: Converter 12 has failed because x = -2 and y = -4

So far I have used an if statement with regexp to find the 'Failure Report' and then go into that message. I can easily extract the time (because it is consistent) with textscan, but I am having problems wrapping my head around how to pull out the Fault ID and the Fault Description because they are irregular in size and type of data in them. Does anyone have a suggestion on how I could go about getting this information in a format like this?

Time________________Fault ID________________Failure Report

00:12:34____________Converter fail__________Channel 4 of the converter has failed during tuning

00:12:37____________Comparator 4 Fail________Comparator 4 has failed

Any help would be greatly appreciated.

~Jenn

2 comentarios
Mostrar Ninguno Ocultar Ninguno

Jennifer el 24 de Oct. de 2013

I finally got a chance to try this in my code and I got the error: "Error using cell fun Input #2 expected to be a cell array, was char instead"

There is no value for x but there are values for ii, len and isin. Suggestions?

Kelly Kearney el 28 de Oct. de 2013

Abrir en MATLAB Online

How did you read in your file? I assumed that your data would be a cell array, with one string per line, such as would be created by, say

 fid = fopen('file.txt');
 data = textscan(fid, '%s', 'delimiter', '\n');
 filetext = data{1};

but it seems you have a character array instead, from, maybe

filetext = fileread('file.txt');

Try running:

filetext = regexp(filetext, '\n', 'split')';

then trying the parse portion of my code.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

Kelly Kearney el 20 de Sept. de 2013

Abrir en MATLAB Online

1 voto

Assuming I've interpreted your file format correctly, you probably don't need regexp. My parsing below assumes that 1) each entry has the same info (i.e. time, id, descrip, etc) 2) All pieces of info are on their own line, and 3) All entries of interest consist of a key phrase followed by a colon, a space, and the string of interest.

    % Data (assuming file is read into a cell array)
    filetext = {...
    'Failure Report'
    'Time: 00:12:34'
    'Fault ID: Converter fail'
    'Fault Description: Channel 4 of the converter has failed during tuning'
    '\n'
    ''
    'Failure Report'
    'Time: 00:12:37'
    'Fault ID: Comparator 4 Fail'
    'Fault Description: comparator 4 has failed'
    '\n'
    ''
    'Failure Report'
    'Time: 00:12:39'
    'Fault ID: Converter fail in practice'
    'Fault Description: Channel 4 of the converter has failed when in mode 5'
    '\n'};
    % Parse
    markers = {'Time', 'Fault ID', 'Fault Description'};
    for ii = 1:length(markers)
        len = length(markers{ii});
        isin = strncmp(filetext, markers{ii}, len);
        data(:,ii) = cellfun(@(x) x(len+3:end), filetext(isin), 'uni', 0);
    end

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Jennifer el 20 de Sept. de 2013

Abrir en MATLAB Online

Thank you Kelly, 
I will give it a shot and see if it works out for me.  I did simplify the problem a little by removing the other reports that will be randomly interspersed in the text file.  For instance there would be 'System Report' and 'Health Report' as well as the 'Failure Report' that I specified previously.

Your second and third assumption are correct each entry will have its own line and each entry will have its phrase followed by : and then a tab (didn't come out well in my formatting) followed then by the entry of interest.

Iniciar sesión para comentar.

Import Irregular and nonpaterned text data into a matrix form

2 comentarios
Mostrar Ninguno Ocultar Ninguno

Respuestas (1)

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Categorías

Etiquetas

Community Treasure Hunt

Import Irregular and nonpaterned text data into a matrix form

2 comentarios Mostrar Ninguno Ocultar Ninguno

Respuestas (1)

1 comentario Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Categorías

Etiquetas

Ver también

Community Treasure Hunt

2 comentarios
Mostrar Ninguno Ocultar Ninguno

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos