Extract numbers from mixed string
Mostrar comentarios más antiguos
I have a file containing header lines like the following,
Test setup: MaxDistance = 60 m, Rate = 1.000, Permitted Error = 50
Operator Note: Air Temperature=20 C, Wind Speed 16.375m/s, Altitude 5km (Cloudy)
For a given parameter such as MaxDistance or Wind Speed, I would like to extract its numerical value. This is tricky because sometimes there is an equal sign, space, or units, and sometimes there is not, because different operators enter their notes differently (lesson: next time enforce consistency).
How would I extract the following: All numerical characters (ignoring spaces and equal signs but keeping decimal points) that appear after the string representing the parameter name. Stop when a letter or punctuation mark is reached. In the case of 'MaxDistance', I would obtain 60. In the case of Wind Speed, I would obtain 16.375.
2 comentarios
Albert Yam
el 19 de Jul. de 2012
Editada: John Kelly
el 26 de Feb. de 2015
What have you tried?
Jianming She
el 17 de Jun. de 2020
Editada: Jianming She
el 18 de Jun. de 2020
This seems a more general way:
function numArray = extractNumFromStr(str)
str1 = regexprep(str,'[,;=]', ' ');
str2 = regexprep(regexprep(str1,'[^- 0-9.eE(,)/]',''), ' \D* ',' ');
str3 = regexprep(str2, {'\.\s','\E\s','\e\s','\s\E','\s\e'},' ');
numArray = str2num(str3);
Example:
a = 'alpha=-3.5,beta=1e-2. but gamma = -34.1'
numArray = extractNumFromStr(a)
numArray =
-3.5000 0.0100 -34.1000
Respuesta aceptada
Más respuestas (5)
Stephan Koehler
el 7 de Jun. de 2017
6 votos
Here is a one-line answer str2num( regexprep( Str, {'\D*([\d\.]+\d)[^\d]*', '[^\d\.]*'}, {'$1 ', ' '} ) )
2 comentarios
Alexandre THIBEAULT
el 27 de En. de 2021
Best answer
Marco A. Acevedo Z.
el 2 de Abr. de 2021
hi, good answer but how to include the - sign (if present). Thanks.
Freddy
el 19 de Jul. de 2012
Maybe a little bit too late, but i like to present you also my ("regexp training"-) solution. :)
A = regexp(Str,'(?<Keyword>(?:\w+\s*\w+))\s*=?\s*(?<Value>\d+\.?\d*)','names');
s = struct();
for i = A,
s.(genvarname(i.('Keyword'))) = str2double(i.('Value'));
end
1 comentario
Albert Yam
el 19 de Jul. de 2012
Editada: Albert Yam
el 19 de Jul. de 2012
That took a long time for me to understand what you are doing. That's cool though.
How does it skip over 'Operator Note:' ?
Edit: Never mind I get it. It doesn't have anything for ':'. The '(?:\w' has nothing to do with a ':' in the string, it is grouping the token for 'up to two words'.
Albert Yam
el 19 de Jul. de 2012
This is how I went about it, all steps included even the errors.
teststr = 'Test setup: MaxDistance = 60 m, Rate = 1.000, Permitted Error = 50 Operator Note: Air Temperature=20 C, Wind Speed 16.375m/s, Altitude 5km (Cloudy)';
regexp(teststr,[\d])
regexp(teststr,['\d'])
regexp(teststr,['\d'],'match')
regexp(teststr,['\d+'],'match')
regexp(teststr,['\d+.?'],'match')
regexp(teststr,['\d+\.?'],'match')
regexp(teststr,['\d+\.?\d?'],'match')
regexp(teststr,['\d+\.?\d+?'],'match')
regexp(teststr,['\d+\.?\d*?'],'match')
regexp(teststr,['\d+\.?\d?'],'match')
regexp(teststr,['\d+\.?\d*'],'match')
6 comentarios
K E
el 19 de Jul. de 2012
Albert Yam
el 19 de Jul. de 2012
Learning is fun.
Be careful with the last solution :
'\d+\.?\d*'
with the case:
teststr = 'Test setup: MaxDistance = 60 m, Rate = 1.000, Permitted Error = .5 Operator Note: Air Temperature=-20 C, Wind Speed 16.375m/s, Altitude 5km (Cloudy)';
it doesn't work (negative number and '.xxx' number notation (like Permitted Error & Air Temperature in the sample)).
If someone has already done these cases ...
G
el 7 de Nov. de 2013
solved!
regexp(teststr,'\d+\.?\d*|-\d+\.?\d*|\.?\d*','match')
Better:
regexp(teststr,'\d+\.?\d*|-\d+\.?\d*|\.?\d+|-\.?\d+','match')
or
regexp(teststr,'-?\d+\.?\d*|-?\d*\.?\d+','match')
remains the -.34e-004 case !
Angkur Shaikeea
el 21 de Oct. de 2021
Editada: Angkur Shaikeea
el 21 de Oct. de 2021
i need to extract
0.00000 0.00000 0.00000
0.00000 1.00000 0.00000
1.00000 0.00000 0.00000
from a text file containing
.............................................
Nodal positions:
0.00000 0.00000 0.00000
0.00000 1.00000 0.00000
1.00000 0.00000 0.00000
Nodal positions:
0.00000 0.00000 0.00000
0.00000 1.00000 0.00000
1.00000 0.00000 0.00000
Nodal positions:
0.00000 0.00000 0.00000
0.00000 1.00000 0.00000
1.00000 0.00000 0.00000
any help using regexp?
C.J. Harris, I put your regexp into a function to extract all numbers using regexp. I have hard time to find an array operation that can use the 'a' and 'b' without the loop. Hopefully somebody has ideas. Of course it is not difficult to add more parameters or options to find "certain" numbers with preceding or following landmark strings.
function nums = regExtractNums(str)
[a,b] = regexp(str, '\d+(\.\d+)?');
nums = zeros(length(a),1);
for k = 1:length(a)
nums(k) = str2double(str(a(k):b(k)));
end
end
C.J. Harris
el 19 de Jul. de 2012
In order to extract a certain value:
Str = ['Test setup: MaxDistance = 60 m, Rate = 1.000, ', ...
'Permitted Error = 50 Operator Note: Air Temperature=20 C, ', ...
'Wind Speed 16.375m/s, Altitude 5km (Cloudy)'];
matchWord = 'Air Temperature';
[a,b] = regexp(Str,'\d+(\.\d+)?');
strPos = find(a > strfind(Str,matchWord),1,'first');
nValue = str2double(Str(a(strPos):b(strPos)));
Categorías
Más información sobre Characters and Strings en Centro de ayuda y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!