How to extract numeric data between string lines?

3 visualizaciones (últimos 30 días)
Federico Geser
Federico Geser el 27 de En. de 2021
Editada: Stephen23 el 27 de En. de 2021
Hi MATLAB Community
I'm trying to solve this problem, which for sure is not new, but I haven't been able to find a proper solution.
I have a file with several headlines, and then a lot of information in the following way:
Binning n: 1, "De19 ", Event #: 150, Primary(s) weight 1.0000E+00
Number of hit cells: 0
Binning n: 1, "De19 ", Event #: 151, Primary(s) weight 1.0000E+00
Number of hit cells: 1
1 7.185244612628594E-05
Binning n: 1, "De19 ", Event #: 152, Primary(s) weight 1.0000E+00
Number of hit cells: 0
Binning n: 1, "De19 ", Event #: 153, Primary(s) weight 1.0000E+00
Number of hit cells: 0
As shown, sometimes after the "Number of hit cells" line, there are numbers. I would like to extract them in a matrix or array. Is there a way to do this?
I attached an example file, that usually contains a lot more of data, that I erased for weight questions.
Thank you very much in advance

Respuesta aceptada

Stephen23
Stephen23 el 27 de En. de 2021
Editada: Stephen23 el 27 de En. de 2021
str = fileread('02-2021-Clearance-Box005_fort72.txt');
rgx = '(?<=Number of hit cells:\s+\d+\s+)(\d+[^\n]*)';
tmp = regexp(str,rgx,'match')
tmp = 1x2 cell array
{'1 7.185244612628594E-05'} {'1 2.547905314713717E-04'}
vec = cellfun(@(s)sscanf(s,'%f',[1,Inf]),tmp,'uni',0) % convert to numeric
vec = 1x2 cell array
{1×2 double} {1×2 double}
mat = vertcat(vec{:}) % optional merge into one numeric matrix
mat = 2×2
1 7.1852e-05 1 0.00025479
  4 comentarios
Federico Geser
Federico Geser el 27 de En. de 2021
Hi Stephen!
I think it works, but the test file has 12 MB of info to filter, so it might take a while. I don't know if this will work when I get the real results (that may weight ca. 100 MB).
Nevertheless, very helpful solution! Thank you!
Stephen23
Stephen23 el 27 de En. de 2021
Editada: Stephen23 el 27 de En. de 2021
If there are always exactly two numbers on each of those lines, then this is probably more efficient:
str = fileread('02-2021-Clearance-Box005_fort72.txt');
rgx = '(?<=Number of hit cells:\s+\d+\s+)(\d+[^\n]*)'; % unchanged
tmp = regexp(str,rgx,'match'); % unchanged
mat = sscanf(sprintf(' %s',tmp{:}),'%f',[2,Inf]).'
mat = 2×2
1 7.1852e-05 1 0.00025479

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Data Type Conversion en Help Center y File Exchange.

Etiquetas

Productos


Versión

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by