How to capture an optional expression using regular expressions?

1 visualización (últimos 30 días)
Patrick Mboma
Patrick Mboma el 23 de Sept. de 2015
Comentada: Walter Roberson el 25 de Sept. de 2015
Dear all,
I would like to use regular expressions to capture and transform expressions of the form
name
or
name(string,digits)
where name belongs to a list of NAMES and digits is an integer: 1, 2, 3,...
That is, "name" is optionally followed by
  1. an opening parenthesis: (
  2. a string
  3. some numbers : 1, 2, 3,...
  4. a closing parenthesis: )
For that purpose I wrote the following regular expression that does not work
expr='(\w+)((\w+),(\d+))?'
replace='${convertMe($1,$3,$4)}';
result=regexprep(cellarray,expr,replace)
I have written a convertMe function taking 3 inputs but only the first input gets in. The other inputs the function receives are $3 and $4 instead of the second string and the digits.
Any suggestions?

Respuestas (2)

Walter Roberson
Walter Roberson el 23 de Sept. de 2015
For the longer case, expr='(\w+)(?:\(\w+),\(d+)\))' replace='${convertMe($1,$2,$3)}'
For the case with no argument supplied, it is not clear what you would like passed to convertMe or if you want convertMe to be called at all.
  2 comentarios
Patrick Mboma
Patrick Mboma el 23 de Sept. de 2015
Hi Walter, Thanks for the answer. It seems to me that what you suggest has some problems. I tried to correct your suggestion as follows
expr='(\w+)(?:\((\w+),(\d+)\))'
But it still doesn't work. I do capture the first \w+ but not the second and not the \d+ . The second and third arguments received by convertMe are $2 and $3.
In the short case, I would like to capture only the first \w+ only. I would be fine if in that case convertMe receives $2 and $3 as the second and third input arguments respectively.
Walter Roberson
Walter Roberson el 25 de Sept. de 2015
Odd. I had it working the other day, but now it doesn't.

Iniciar sesión para comentar.


Cedric
Cedric el 24 de Sept. de 2015
Another option is to parse all entries first, and then to rebuild relevant expressions:
entries = {'name1(John, 48)', 'name2', 'name3(Doo)'} ;
tokens = regexp( entries, '([\w\d_-]+)\(?(\w+)?,?\s*(\d+)?', 'tokens', 'once' ) ;
parsed = vertcat( tokens{:} ) ;
With that you get
>> parsed
parsed =
'name1' 'John' '48'
'name2' '' ''
'name3' 'Doo' ''
which is easy to post process for building whatever you need.

Categorías

Más información sobre Characters and Strings en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by