unexpected comma in regexp output

Hi, everyone, I want to extract the word "Df(3R)ED50003". from a string below:
The word is composed of A-Z a-z 0-9 - _ ( )
aStr = 'w[1118]; Df(3R)ED50003, P{w[+mW.Scer\FRT.hs3]=3''.RS5+3.3''}ED50003/TM6C, cu[1] Sb[1]';
Below is the code I used:
[t1,t2] = regexp(aStr,'.*(Df[\(\)-_a-zA-Z0-9]+).*','tokens')
However, I got :
t1{1}{1}
Df(3R)ED50003,
There is a comma in the end which I did not include in the regexp. I expect Df(3R)ED50003, but the results has one more comma.
Can someone help me on where an I wrong? Thanks

 Respuesta aceptada

Guillaume
Guillaume el 26 de Oct. de 2019
Editada: Guillaume el 26 de Oct. de 2019
Note your example input is not valid matlab syntax. I assume the internal ' are meant to be doubled.
I'm not too sure what you're trying to do with your regex, some of it is overcomplicated, e.g.:
regexp(s, '.*(somexepr).*', 'tokens')
is the same as the simpler (and most likely much faster, .* can slow regexp tremendously if used carelessly)
regexp(s, 'somexpr', 'match')
I'm not entirely clear on what exactly you want to include in your match. I don't think you understand fully how [] works in a regexp, and in particular the role of - in there. Your [\(\)-_a-zA-Z0-1]+ expression matches one or more of:
  • a (, your \(,
  • any character in the range '(':'_', your \)-_, note that this range does include the comma. It's probably where you went wrong.
  • any character in the range 'a':'z',
  • any character in the range 'A':'Z',
  • 0, or 1, which you have written as 0-1 but could be written more simply as 01
>> '(':'_' %all characters matched by \)-_
ans =
'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_'

4 comentarios

raym
raym el 26 de Oct. de 2019
Thanks.I'm wrong at not add a slash \ before - to indicate the char - itself.
Below works:
'.*(Df[\(\)\-_a-zA-Z0-9]+).*'
Guillaume
Guillaume el 26 de Oct. de 2019
Or you could move the - to the beginning or the end of the list, where it doesn't need escaping:
'[-\(\)_a-zA-Z0-9]+' % - doesn't need escaping when it's the first in the list
%or
'[\(\)_a-zA-Z0-9-]+' % or when it's the last
raym
raym el 27 de Oct. de 2019
Yes.
I also found that below two command is same:
regexp(s, 'somexpr', 'match')
regexp(s, '(somexpr)', 'match')
somexpr can be surround by () even when there is no () in string.
Stephen23
Stephen23 el 27 de Oct. de 2019
"somexpr can be surround by () even when there is no () in string."
That is because parentheses are a grouping operator, not literal characters:

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Characters and Strings en Centro de ayuda y File Exchange.

Etiquetas

Preguntada:

el 26 de Oct. de 2019

Comentada:

el 27 de Oct. de 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by