简体   繁体   中英

Regex alternation construct eats part of previous group

I am trying to fashion a Regex to capture a function style argument list, which should be straight forward enough, but I'm encountering a behaviour I don't understand.

In the snippet below the first example behaves as you would expect, capturing the function name into the first group and the argument list into the second group.

In the second example I want to replace the 'zero or more' quantifier that captures the argument list with the 'one or more' quantifier so that the second group will fail if there are no arguments. I'm expecting the regex to capture just the function name, but for some reason the regex is eating the '1' off the end of the function name, and I cannot for the life of me see why it would be doing that. Can anyone see what's going wrong with this please?

// {func1} {blah, blah, blah}
Match m13 = Regex.Match("func1(blah, blah, blah)", @"(\w+) (?([(]) [(]([^)]*) )",
    RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase);

// {func}
Match m14 = Regex.Match("func1()", @"(\w+) (?([(]) [(]([^)]+) )",
    RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase);

Your expression can be adjusted to:

(\w+) (?([(]) [(]([^)]*) )
                      ^ rather than +

The reason the expression returns an unexpected result relates to backtracking. The regex engine effectively takes the following steps:

  1. (\\w) matches func1 .
  2. func1 is immediately followed by a ( , matching the zero-width expression in the conditional matching construct.
  3. The conditional construct requires a ( literal followed by one or more characters that are not ) . This condition fails for the input func1() , since there are zero characters between ( and ) .
  4. The engine backtracks to step (1) and removes a character, so that (\\w) now matches func instead of func1 .
  5. func is immediately followed by a 1 , which does not satisfy the zero-width expression in the conditional matching construct.
  6. Since the conditional matching construct does not match and there is no alternate expression, the regex completes successfully with func in the first captured group, and no match in the second captured group.

The issue emerges in Step 3, where the expression fails to allow () as a legal argument list. Adjusting the expression to allow zero characters between the opening and closing parentheses (as illustrated above) allows this sequence. An expression such as ^(\\w+)(?:\\((.*)\\))?$ may also address the underlying issue without the need for a conditional construct.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM