简体   繁体   中英

Regex - Match versus Groups

I am sorry in advance if this would fall under duplicates but I could not see these answered my questions.

Could you please help and explain:

  1. Where is the match or capture only for name held? The initial part of the pattern [A-Za-z0-9_\\-\\.]+ does not show it between brackets so I understand it won't be a group, how then is name captured and held as a component of Match 0 ?

  2. If I replace the string t2 to name@domain.com alt@yahoo.net and pattern to ^([A-Za-z0-9_\\-\\.\\ ]+@(([A-Za-z0-9\\-])+\\.)+([A-Za-z\\-])+)+$

    • I would expect 2 matches: One for each full email address. Output only shows 1 match holding both separated by a space, why?
    • How should the pattern read to get 2 matches or would the string need to be different for this pattern?
    • I don't see the consistency in the Group output because it does not show another Group holding capture 0=com and capture 1=net , similarly to Group 2 holding domain. and yahoo. captures, why?
    • Group 3 captures seem to hold the captures of the Group 2 Capture 0 and 1, is that how hierarchies work, there are captures of captures of groups?

Code

static void Main(string[] args)
    {
        string t2 = "name@domain.com";
        string p2 = @"^[A-Za-z0-9_\-\.\ ]+@(([A-Za-z0-9\-])+\.)+([A-Za-z\-])+$";

        MatchCollection matches = Regex.Matches(t2, p2);
        GroupCollection gc;
        int groupIndex = 0;
        int matchIndex = 0;
        int captureIndex = 0;

        foreach (Match nextMatch in matches)
        {
            gc = nextMatch.Groups;
            Console.WriteLine("Match {0} holds: {1}", matchIndex, nextMatch.Value);
            matchIndex++;
            foreach (Group g in gc)
            {
                Console.WriteLine("Group {0} holding: {1}", groupIndex, g.ToString());
                groupIndex++;

                foreach (Capture capture in g.Captures)
                {
                    Console.WriteLine("\tCapture {0} holds {1}", captureIndex, capture.ToString());
                    captureIndex++;
                }
                captureIndex = 0;
            }
            groupIndex = 0;
        }
        matchIndex = 0;
    }

Output for the above code:

Match 0 holds: name@domain.com
Group 0 holding: name@domain.com
Capture 0 holds name@domain.com
Group 1 holding: domain.
Capture 0 holds domain.
Group 2 holding: n
Capture 0 holds d
Capture 1 holds o
Capture 2 holds m
Capture 3 holds a
Capture 4 holds i
Capture 5 holds n
Group 3 holding: m
Capture 0 holds c
Capture 1 holds o
Capture 2 holds m
Press any key to continue . . .

Output if string t2 = "name@domain.com alt@yahoo.net"; and string p2 = @"^([A-Za-z0-9_\\-\\.\\ ]+@(([A-Za-z0-9\\-])+\\.)+([A-Za-z\\-])+)+$" ;

Match 0 holds: name@domain.com alt@yahoo.net
Group 0 holding: name@domain.com alt@yahoo.net
Capture 0 holds name@domain.com alt@yahoo.net
Group 1 holding:  alt@yahoo.net
Capture 0 holds name@domain.com
Capture 1 holds  alt@yahoo.net
Group 2 holding: yahoo.
Capture 0 holds domain.
Capture 1 holds yahoo.
Group 3 holding: o
Capture 0 holds d
Capture 1 holds o
Capture 2 holds m
Capture 3 holds a
Capture 4 holds i
Capture 5 holds n
Capture 6 holds y
Capture 7 holds a
Capture 8 holds h
Capture 9 holds o
Capture 10 holds o
Group 4 holding: t
Capture 0 holds c
Capture 1 holds o
Capture 2 holds m
Capture 3 holds n
Capture 4 holds e
Capture 5 holds t
Press any key to continue . . .

The Match covers the matching of the entire regex. The regex can be applied to the given string.

Group s are part of that Match and Capture s are (if you specified multiple occurences of a group like (someRegex)+ ) all Capture s of that Group . Try changing ([A-Za-z\\-])+ to ([A-Za-z\\-]+) and see the difference!

Examples:

\\w*(123)\\w* on "asdsa123asdf"

  1. Match -> asdsa123asdf
  2. Group -> 123 (== last capture)
  3. Captures -> 123

\\w*([123])+\\w* on "asdsa123asdf"

  1. Match -> asdsa123asdf
  2. Group -> 3 (== last capture)
  3. Captures -> 1, 2, 3

There are multiple sites to test and show details of your regex, ie https://regexr.com or https://regex101.com

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM