Import Irregular and nonpaterned text data into a matrix form
Mostrar comentarios más antiguos
I am sure this is covered elsewhere but I have been unable to find it. I am interested in creating a parser that will extract data from a text file and put it into a matrix where it will be viewable in Excel. A sample of my data is below:
Failure Report
Time: 00:12:34
Fault ID: Converter fail
Fault Description: Channel 4 of the converter has failed during tuning
\n
Failure Report
Time: 00:12:37
Fault ID: Comparator 4 Fail
Fault Description: comparator 4 has failed
\n
Failure Report
Time: 00:12:39
Fault ID: Converter fail in practice
Fault Description: Channel 4 of the converter has failed when in mode 5
\n
Failure Report
Time: 00:12:45
Fault ID: Converter 12 Fail
Fault Description: Converter 12 has failed because x = -2 and y = -4
So far I have used an if statement with regexp to find the 'Failure Report' and then go into that message. I can easily extract the time (because it is consistent) with textscan, but I am having problems wrapping my head around how to pull out the Fault ID and the Fault Description because they are irregular in size and type of data in them. Does anyone have a suggestion on how I could go about getting this information in a format like this?
Time________________Fault ID________________Failure Report
00:12:34____________Converter fail__________Channel 4 of the converter has failed during tuning
00:12:37____________Comparator 4 Fail________Comparator 4 has failed
Any help would be greatly appreciated.
~Jenn
2 comentarios
Jennifer
el 24 de Oct. de 2013
Kelly Kearney
el 28 de Oct. de 2013
How did you read in your file? I assumed that your data would be a cell array, with one string per line, such as would be created by, say
fid = fopen('file.txt');
data = textscan(fid, '%s', 'delimiter', '\n');
filetext = data{1};
but it seems you have a character array instead, from, maybe
filetext = fileread('file.txt');
Try running:
filetext = regexp(filetext, '\n', 'split')';
then trying the parse portion of my code.
Respuestas (1)
Kelly Kearney
el 20 de Sept. de 2013
Assuming I've interpreted your file format correctly, you probably don't need regexp. My parsing below assumes that 1) each entry has the same info (i.e. time, id, descrip, etc) 2) All pieces of info are on their own line, and 3) All entries of interest consist of a key phrase followed by a colon, a space, and the string of interest.
% Data (assuming file is read into a cell array)
filetext = {...
'Failure Report'
'Time: 00:12:34'
'Fault ID: Converter fail'
'Fault Description: Channel 4 of the converter has failed during tuning'
'\n'
''
'Failure Report'
'Time: 00:12:37'
'Fault ID: Comparator 4 Fail'
'Fault Description: comparator 4 has failed'
'\n'
''
'Failure Report'
'Time: 00:12:39'
'Fault ID: Converter fail in practice'
'Fault Description: Channel 4 of the converter has failed when in mode 5'
'\n'};
% Parse
markers = {'Time', 'Fault ID', 'Fault Description'};
for ii = 1:length(markers)
len = length(markers{ii});
isin = strncmp(filetext, markers{ii}, len);
data(:,ii) = cellfun(@(x) x(len+3:end), filetext(isin), 'uni', 0);
end
1 comentario
Jennifer
el 20 de Sept. de 2013
Categorías
Más información sobre Text Files en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!