简体   繁体   中英

Weird Regex behavior in C#

I am trying to extract some alfanumeric expressions out of a longer word in C# using regular expressions. For example I have the word "FooNo12Bee". I use the the following regular expression code, which returns me two matches, "No12" and "No" as results:

alfaNumericWord = "FooNo12Bee";
Match m = Regex.Match(alfaNumericWord, @"(No|Num)\d{1,3}");

If I use the following expression, without paranthesis and without any alternative for "No" it works the way I am expecting, it returns only "No12":

alfaNumericWord = "FooNo12Bee";
Match m = Regex.Match(alfaNumericWord, @"No\d{1,3}");

What is the difference between these two expressions, why using paranthesis results in a redundant result for "No"?

Parenthesis in regex are capture groups; meaning what's in between the paren will be captured and stored as a capture group.

If you don't want a capture group but still need a group for the alternation, use a non-capture group instead; by putting ?: after the first paren:

Match m = Regex.Match(alfaNumericWord, @"(?:No|Num)\d{1,3}");

Usually, if you don't want to change the regex for some reason, you can simply retrieve the group 0 from the match to get only the whole match (and thus ignore any capture groups); in your case, using m.Groups[0].Value .

Last, you can improve the efficiency of the regex by a notch using:

Match m = Regex.Match(alfaNumericWord, @"N(?:o|um)\d{1,3}");

i can't explain how they call it, but it is because putting parentheses around it is creating a new group. it is well explained here

Besides grouping part of a regular expression together, parentheses also create a numbered capturing group. It stores the part of the string matched by the part of the regular expression inside the parentheses.

The regex Set(Value)? matches Set or SetValue. In the first case, the first (and only) capturing group remains empty. In the second case, the first capturing group matches Value.

It is because the parentheses are creating a group. You can remove the group with ?: like so Regex.Match(alfaNumericWord, @"(?:No|Num)\\d{1,3}");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM