简体   繁体   English

C# 中的正则表达式组

[英]Regular Expression Groups in C#

I've inherited a code block that contains the following regex and I'm trying to understand how it's getting its results.我继承了一个包含以下正则表达式的代码块,我试图了解它是如何获得结果的。

var pattern = @"\[(.*?)\]";
var matches = Regex.Matches(user, pattern);
if (matches.Count > 0 && matches[0].Groups.Count > 1)
    ...

For the input user == "Josh Smith [jsmith]" :对于输入user == "Josh Smith [jsmith]"

matches.Count == 1
matches[0].Value == "[jsmith]"

... which I understand. ……我明白了。 But then:但是之后:

matches[0].Groups.Count == 2
matches[0].Groups[0].Value == "[jsmith]"
matches[0].Groups[1].Value == "jsmith" <=== how?

Looking at this question from what I understand the Groups collection stores the entire match as well as the previous match.从我的理解来看这个问题,Groups 集合存储了整个比赛以及上一场比赛。 But, doesn't the regexp above match only for [open square bracket] [text] [close square bracket] so why would "jsmith" match?但是,上面的正则表达式不只匹配 [open square bracket] [text] [close square bracket] 那么为什么“jsmith”会匹配?

Also, is it always the case the the groups collection will store exactly 2 groups: the entire match and the last match?此外,groups 集合是否总是存储 2 个组:整个匹配和最后一个匹配?

  • match.Groups[0] is always the same as match.Value , which is the entire match. match.Groups[0]始终与match.Value相同,后者是整个匹配项。
  • match.Groups[1] is the first capturing group in your regular expression. match.Groups[1]是正则表达式中的第一个捕获组。

Consider this example: 考虑以下示例:

var pattern = @"\[(.*?)\](.*)";
var match = Regex.Match("ignored [john] John Johnson", pattern);

In this case, 在这种情况下,

  • match.Value is "[john] John Johnson" match.Value"[john] John Johnson"
  • match.Groups[0] is always the same as match.Value , "[john] John Johnson" . match.Groups[0]始终与match.Value "[john] John Johnson"
  • match.Groups[1] is the group of captures from the (.*?) . match.Groups[1](.*?)中捕获的组。
  • match.Groups[2] is the group of captures from the (.*) . match.Groups[2](.*)中捕获的组。
  • match.Groups[1].Captures is yet another dimension. match.Groups[1].Captures是另一个维度。

Consider another example: 考虑另一个示例:

var pattern = @"(\[.*?\])+";
var match = Regex.Match("[john][johnny]", pattern);

Note that we are looking for one or more bracketed names in a row. 请注意,我们正在连续查找一个或多个带括号的名称。 You need to be able to get each name separately. 您需要能够分别获得每个名称。 Enter Captures ! 输入Captures

  • match.Groups[0] is always the same as match.Value , "[john][johnny]" . match.Groups[0]始终与match.Value"[john][johnny]"
  • match.Groups[1] is the group of captures from the (\\[.*?\\])+ . match.Groups[1](\\[.*?\\])+中捕获的组。 The same as match.Value in this case. 在这种情况下,与match.Value相同。
  • match.Groups[1].Captures[0] is the same as match.Groups[1].Value match.Groups[1].Captures[0]match.Groups[1].Value
  • match.Groups[1].Captures[1] is [john] match.Groups[1].Captures[1][john]
  • match.Groups[1].Captures[2] is [johnny] match.Groups[1].Captures[2][johnny]

The ( ) acts as a capture group. ( )充当捕获组。 So the matches array has all of matches that C# finds in your string and the sub array has the values of the capture groups inside of those matches. 因此matchs数组具有C#在您的字符串中找到的所有匹配项,而sub数组具有这些匹配项内的捕获组的值。 If you didn't want that extra level of capture jut remove the ( ) . 如果您不想增加捕获级别,请删除( )

括号也标识一个组,因此匹配项1是整个匹配项,而匹配项2是在方括号之间找到的内容。

How? 怎么样? The answer is here 答案在这里

(.*?)

That is a subgroup of @"[(.*?)]; 那是@“ [(。*?)];的子组;

Groups[0] is your entire input string. Groups[0]是您的整个输入字符串。

Groups[1] is your group captured by parentheses (.*?) . Groups[1]是用括号(.*?)捕获的组。 You can configure Regex to capture Explicit groups only (there is an option for that when you create a regex), or use (?:.*?) to create a non-capturing group. 您可以将正则表达式配置为仅捕获显式组(创建正则表达式时有一个选项),或使用(?:.*?)创建不捕获组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM