简体   繁体   中英

Regular Expression Groups in C#

I've inherited a code block that contains the following regex and I'm trying to understand how it's getting its results.

var pattern = @"\[(.*?)\]";
var matches = Regex.Matches(user, pattern);
if (matches.Count > 0 && matches[0].Groups.Count > 1)
    ...

For the input user == "Josh Smith [jsmith]" :

matches.Count == 1
matches[0].Value == "[jsmith]"

... which I understand. But then:

matches[0].Groups.Count == 2
matches[0].Groups[0].Value == "[jsmith]"
matches[0].Groups[1].Value == "jsmith" <=== how?

Looking at this question from what I understand the Groups collection stores the entire match as well as the previous match. But, doesn't the regexp above match only for [open square bracket] [text] [close square bracket] so why would "jsmith" match?

Also, is it always the case the the groups collection will store exactly 2 groups: the entire match and the last match?

  • match.Groups[0] is always the same as match.Value , which is the entire match.
  • match.Groups[1] is the first capturing group in your regular expression.

Consider this example:

var pattern = @"\[(.*?)\](.*)";
var match = Regex.Match("ignored [john] John Johnson", pattern);

In this case,

  • match.Value is "[john] John Johnson"
  • match.Groups[0] is always the same as match.Value , "[john] John Johnson" .
  • match.Groups[1] is the group of captures from the (.*?) .
  • match.Groups[2] is the group of captures from the (.*) .
  • match.Groups[1].Captures is yet another dimension.

Consider another example:

var pattern = @"(\[.*?\])+";
var match = Regex.Match("[john][johnny]", pattern);

Note that we are looking for one or more bracketed names in a row. You need to be able to get each name separately. Enter Captures !

  • match.Groups[0] is always the same as match.Value , "[john][johnny]" .
  • match.Groups[1] is the group of captures from the (\\[.*?\\])+ . The same as match.Value in this case.
  • match.Groups[1].Captures[0] is the same as match.Groups[1].Value
  • match.Groups[1].Captures[1] is [john]
  • match.Groups[1].Captures[2] is [johnny]

The ( ) acts as a capture group. So the matches array has all of matches that C# finds in your string and the sub array has the values of the capture groups inside of those matches. If you didn't want that extra level of capture jut remove the ( ) .

括号也标识一个组,因此匹配项1是整个匹配项,而匹配项2是在方括号之间找到的内容。

How? The answer is here

(.*?)

That is a subgroup of @"[(.*?)];

Groups[0] is your entire input string.

Groups[1] is your group captured by parentheses (.*?) . You can configure Regex to capture Explicit groups only (there is an option for that when you create a regex), or use (?:.*?) to create a non-capturing group.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM