Why is my regular expression always greedy?
Mostrar comentarios más antiguos
I have the following string, read into MATLAB:
*aaa
$bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
1111111111111111111111
222222222222
3333333333333333333333333
4444444444555556666666
777777788899999
*ddd
$11111111111111111111111111111111
222222222222222abcdf
99999999999
*abcde99999
$eeeeeeeeeeeeeeeeeeeeee
I would like to perform a search that only extracts the text between *aaa and *ddd, using the following regexp pattern:
pattern = '(?<=\*aaa\s)(.*|\n)*?(?=\*)';
I expected the middle (.*|\n)*? to match the minimum number of "either any character other than linebreak, or a linebreak" that sits between *aaa and the closest * symbol, at *ddd. Instead, MATLAB returns the following:
$bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
1111111111111111111111
222222222222
3333333333333333333333333
4444444444555556666666
777777788899999
$11111111111111111111111
*ddd
$11111111111111111111111111111111
222222222222222abcdf
99999999999
Instead of stopping at just before *ddd, regexp continued until just before *abcde99999, despite the presence of the "?" at the end of the middle section of the pattern.
Just to make sure this isn't a lookaround issue, I also tried running
pattern = '\*(.*|\n)*?\*';
And sure enough, I get the following, with the *ddd in the middle being skipped entirely:
*aaa
$bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
1111111111111111111111
222222222222
3333333333333333333333333
4444444444555556666666
777777788899999
$11111111111111111111111
*ddd
$11111111111111111111111111111111
222222222222222abcdf
99999999999
*
Respuesta aceptada
Más respuestas (0)
Categorías
Más información sobre Operations on Strings en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!