MATLAB Answers

per isakson
2

Regular expression. Are nesting of group operators supported?

Asked by per isakson
on 17 Dec 2018
Latest activity Edited by per isakson
on 14 Oct 2019 at 9:09
Regarding Grouping Operators the function, regexp, doesn't behave the way I expected.
>> cac = regexp( 'ABC', '((A)(B(C)))', 'tokens' );
>> cac{1}(:)
ans =
1×1 cell array
{'ABC'}
regexp returns one token without any protests regarding my parentheses. I expected four: 'ABC', 'A', 'BC' and 'C'. The reason I expected that is because most other flavors of regular expressions would have returned four tokens. Java: Capturing Groups would
In the expression ((A)(B(C))), for example, there are four such groups:
  1. ((A)(B(C)))
  2. (A)
  3. (B(C))
  4. (C)
Another couple of examples
>> cac = regexp( 'ABC', '(A)(B(C))', 'tokens' );
>> cac{1}(:)
ans =
2×1 cell array
{'A' }
{'BC'}
>> cac = regexprep( 'ABC', '((A)(B(C)))', ' --- $1 ---' )
cac =
' --- ABC ---'
>> cac = regexprep( 'ABC', '((A)(B(C)))', ' --- $2 ---' )
cac =
' --- $2 ---'
The documentation on Grouping Operators is terse and there are only few examples. I've found nothing on "groups inside groups".
Question:
Are nesting of group operators supported or am I a victim of wishful thinking?

  0 Comments

Sign in to comment.

1 Answer

Answer by Sean de Wolski
on 17 Dec 2018
Edited by Sean de Wolski
on 17 Dec 2018
 Accepted Answer

The Note below "Named Token Operator" indicates that the outermost will be captured, hence ABC and one token.
Note
If an expression has nested parentheses, MATLAB® captures tokens that correspond to the outermost set of parentheses. For example, given the search pattern '(and(y|rew))', MATLAB creates a token for 'andrew' but not for 'y' or 'rew'.
With string arrays, I'd recommend just creating an array of acceptable tokens:
cac = regexp("ABC", ["(ABC)","(A)", "(BC)", "(C)"], 'tokens' );
cac{:}
ans =
1×1 cell array
{["ABC"]}
ans =
1×1 cell array
{["A"]}
ans =
1×1 cell array
{["BC"]}
ans =
1×1 cell array
{["C"]}

  0 Comments

Sign in to comment.