简体   繁体   English

C#中奇怪的Regex行为

[英]Weird Regex behavior in C#

I am trying to extract some alfanumeric expressions out of a longer word in C# using regular expressions. 我正在尝试使用正则表达式从C#中的较长单词中提取一些字母数字表达式。 For example I have the word "FooNo12Bee". 例如,我有单词“ FooNo12Bee”。 I use the the following regular expression code, which returns me two matches, "No12" and "No" as results: 我使用以下正则表达式代码,返回两个匹配结果“ No12”和“ No”作为结果:

alfaNumericWord = "FooNo12Bee";
Match m = Regex.Match(alfaNumericWord, @"(No|Num)\d{1,3}");

If I use the following expression, without paranthesis and without any alternative for "No" it works the way I am expecting, it returns only "No12": 如果我使用以下表达式,但不带括号,并且没有“ No”的替代选项,它会按我期望的方式工作,则仅返回“ No12”:

alfaNumericWord = "FooNo12Bee";
Match m = Regex.Match(alfaNumericWord, @"No\d{1,3}");

What is the difference between these two expressions, why using paranthesis results in a redundant result for "No"? 这两个表达式之间有什么区别,为什么使用括号表示“否”会导致多余的结果?

Parenthesis in regex are capture groups; 正则表达式中的括号是捕获组;正则表达式中的括号是捕获组。 meaning what's in between the paren will be captured and stored as a capture group. 这意味着将捕获在paren之间的内容并将其存储为捕获组。

If you don't want a capture group but still need a group for the alternation, use a non-capture group instead; 如果您不想使用捕获组,但仍需要一个组来进行轮换,请改用非捕获组。 by putting ?: after the first paren: 通过把?:第一个括号后:

Match m = Regex.Match(alfaNumericWord, @"(?:No|Num)\d{1,3}");

Usually, if you don't want to change the regex for some reason, you can simply retrieve the group 0 from the match to get only the whole match (and thus ignore any capture groups); 通常,如果由于某种原因不想更改正则表达式,则可以简单地从匹配项中检索组0,以仅获取整个匹配项(从而忽略任何捕获组)。 in your case, using m.Groups[0].Value . 在您的情况下,请使用m.Groups[0].Value

Last, you can improve the efficiency of the regex by a notch using: 最后,您可以通过以下方式使用一个槽口来提高正则表达式的效率:

Match m = Regex.Match(alfaNumericWord, @"N(?:o|um)\d{1,3}");

i can't explain how they call it, but it is because putting parentheses around it is creating a new group. 我无法解释他们如何称呼它,但这是因为在它周围加上括号会创建一个新的组。 it is well explained here 它很好地解释这里

Besides grouping part of a regular expression together, parentheses also create a numbered capturing group. 除了将正则表达式的一部分分组在一起,括号还创建了一个编号捕获组。 It stores the part of the string matched by the part of the regular expression inside the parentheses. 它将与正则表达式部分匹配的字符串部分存储在括号内。

The regex Set(Value)? 正则表达式集(值)? matches Set or SetValue. 与Set或SetValue匹配。 In the first case, the first (and only) capturing group remains empty. 在第一种情况下,第一个(也是唯一的)捕获组保持为空。 In the second case, the first capturing group matches Value. 在第二种情况下,第一个捕获组匹配Value。

It is because the parentheses are creating a group. 这是因为括号正在创建一个组。 You can remove the group with ?: like so Regex.Match(alfaNumericWord, @"(?:No|Num)\\d{1,3}"); 您可以使用?:删除该组?:就像Regex.Match(alfaNumericWord, @"(?:No|Num)\\d{1,3}");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM