简体   繁体   中英

Regular Expression named capturing groups support in Java 7

Since Java 7 regular expressions API offers support for named capturing groups. The method java.util.regex.Matcher.group(String) returns the input subsequence captured by the given named-capturing group, but there's no example available on API documentations.

What is the right syntax to specify and retrieve a named capturing group in Java 7?

Specifying named capturing group

Use the following regex with a single capturing group as an example ([Pp]attern) .

Below are 4 examples on how to specify a named capturing group for the regex above:

(?<Name>[Pp]attern)
(?<group1>[Pp]attern)
(?<name>[Pp]attern)
(?<NAME>[Pp]attern)

Note that the name of the capturing group must strictly matches the following Pattern:

[A-Za-z][A-Za-z0-9]*

The group name is case-sensitive, so you must specify the exact group name when you are referring to them (see below).

Backreference the named capturing group in regex

To back-reference the content matched by a named capturing group in the regex (correspond to 4 examples above):

\k<Name>
\k<group1>
\k<name>
\k<NAME>

The named capturing group is still numbered, so in all 4 examples, it can be back-referenced with \\1 as per normal.

Refer to named capturing group in replacement string

To refer to the capturing group in replacement string (correspond to 4 examples above):

${Name}
${group1}
${name}
${NAME}

Same as above, in all 4 examples, the content of the capturing group can be referred to with $1 in the replacement string.

Named capturing group in COMMENT mode

Using (?<name>[Pp]attern) as an example for this section.

Oracle's implementation of the COMMENT mode (embedded flag (?x) ) parses the following examples to be identical to the regex above:

(?x)  (  ?<name>             [Pp] attern  )
(?x)  (  ?<  name  >         [Pp] attern  )
(?x)  (  ?<  n  a m    e  >  [Pp] attern  )

Except for ?< which must not be separated, it allows arbitrary spacing even in between the name of the capturing group.

Same name for different capturing groups?

While it is possible in .NET, Perl and PCRE to define the same name for different capturing groups, it is currently not supported in Java (Java 8). You can't use the same name for different capturing groups.

Named capturing group related APIs

New methods in Matcher class to support retrieving captured text by group name:

The corresponding method is missing from MatchResult class as of Java 8. There is an on-going Enhancement request JDK-8065554 for this issue.

There is currently no API to get the list of named capturing groups in the regex. We have to jump through extra hoops to get it . Though it is quite useless for most purposes, except for writing a regex tester.

The new syntax for a named capturing group is (?<name>X) for a matching group X named by "name". The following code captures the regex (\\w+) (any group of alphanumeric characters). To name this capturing group you must add the expression ? inside the parentheses just before the regex to be captured.

Pattern compile = Pattern.compile("(?<teste>\\w+)");
Matcher matcher = compile.matcher("The first word is a match");
matcher.find();
String myNamedGroup= matcher.group("teste");
System.out.printf("This is yout named group: %s", myNamedGroup);

This code returns prints the following output:

This is your named group: The

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM