How to parse text data
70 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Life is Wonderful
el 17 de Jul. de 2019
Comentada: Life is Wonderful
el 2 de Ag. de 2019
Hi
I have data in the below format. I need the mechanism to parse the data from below format with expected output.
Input data format:
07/16 12:55:22.012 INFO | test_runner_utils:0812| Began logging to /tmp/test_that_results_hatch_deL3lZ
07/16 12:55:27.477 INFO | test_runner_utils:0259| autoserv| Processing control file
Expected Output format:
Define level of message extraction based on the marker sign ==> |
-Step 1: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|
-Step 2: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>| extract full text in a variable, option to grab variable if associated with value
-Step 3: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|
-Step 4: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|extract full text in a variable, option to grab variable if associated with value
-Step 5: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|<string>|
-Step 6: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|<string>|extract full text in a variable, option to grab variable if associated with value
Input data format:
07/16 12:55:27.620 DEBUG| utils:0287| [stdout] CHROMEOS_RELEASE_BOARD=hatch
07/16 13:28:58.330 INFO | mode_switcher:0673| -[FAFT]-[ start wait_for_client ]---
Expected Output format:
-Step 1: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|<[string]>
-Step 2: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|<[string]> extract full text in a variable, option to grab variable if associated with value
-Step 3: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|<[string]>
-Step 4: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|<[string]> [string] extract full text in a variable, option to grab variable if associated with value
Input data format:
2019-07-16 12:55:30 > string
2019-07-16 12:55:30 powerbtn: released
Expected Output format:
Note the marker >
-Step 1: Extract Timestamp in YYYY:MM:DD HH:mm:sec > < string>
-Step 2: Extract Timestamp in YYYY:MM:DD HH:mm:sec < full string>
Input data format
2019-07-16 12:55:31 > [12074.734997 HC 0x121 err 1]
Expected Output format
-Step 1: Extract Timestamp in YYYY:MM:DD HH:mm:sec > [< %1.3f string extract full text in a variable, option to grab variable if associated with value>]
Thanks a lot
5 comentarios
Respuesta aceptada
Guillaume
el 23 de Jul. de 2019
Editada: Guillaume
el 23 de Jul. de 2019
Are you still on very old version (please fill the release field next to the question)?. If on a modern version, the file can easily be read with:
VariableNames = {'Date', 'Level', 'delim1', 'PID', 'delim2', 'Message'};
VariableWidths = [19, 5, 1, 23, 2, 5000];
VariableTypes = {'datetime', 'char', 'char', 'char', 'char', 'char'};
opts = fixedWidthImportOptions('VariableNames', VariableNames, 'VariableWidths', VariableWidths, 'VariableTypes', VariableTypes, 'SelectedVariableNames', [1, 2, 4, 6]);
opts = setvaropts(opts, 'Date', 'InputFormat', 'MM/yy hh:mm:ss.SSS');
content = readtable('test_that.txt', opts);
results in:
If on a version fo matlab that doesn't have tables, use textscan with fixed width fields:
fid = fopen('test_that.txt', 'rt');
content = textscan(fid, '%18c%*c%5c%*c%23c%*2c%s', 'Delimiter', '', 'Whitespace', '');
fclose(fid);
content = [cellstr(content{1}), cellstr(content{2}), cellstr(content{3}), content{4}]
23 comentarios
Más respuestas (2)
Bob Thompson
el 18 de Jul. de 2019
I need next steps
◾Convert Datacontent into cell's - like timestamp , message data-1,message data-2
◾Put cell in proper format
◾Create Matlab variables
◾Display Matlab variable for good analysis
1) regexp automatically outputs all results in a cell, each containing a string.
2) You can convert strings to date time formats using datetime. To do this 'quickly' I suggest using a loop through your regexp results, or by using cellfun (which is really still a loop).
3) What exactly do you mean by this? I personally do not know of a way to dynamically create variables within Matlab, and I think you would be better served to keep the information in a cell array, or to make a table out of it. It is certainly possible to create new variables in a table from a captured string from regexp.
4) Displaying Matlab variables is simply a matter of not suppressing them, or if specifically wanting to display them then you can use fprintf with no target so it defaults to the command window.
5 comentarios
Bob Thompson
el 19 de Jul. de 2019
Are you only looking to capture the timestamp? It seems like the issue is more in the initial regexp processing than in the date time conversion.
If you are only looking to capture the timestamp I would suggest doing a regexp call like this:
filedata = regexp(filecontent'(\d\d.\d\d\s\d\d.\d\d.\d\d.\d\d\d)\D+\d\d\d\d\D+\n','tokens');
dates = datetime([filedata{:}], 'InputFormat', 'MM/dd HH:mm:ss.SSS');
If you are looking to capture more than the timestamps then please explain more. I know you outline some more in your OP, but I'm not entirely sure what you're referring to.
Ver también
Categorías
Más información sobre Data Import and Export en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!