简体   繁体   中英

Java nested regular expression groups not capturing inner groups

Why is this regex not capturing the innermost group?

final String regex = "s(tra(nsformer))";
final Pattern pattern = Pattern.compile(regex);
final Matcher match = pattern.matcher("stransformer");

if (match.matches()) {
    System.out.println(match.groupCount());
    for (int i = 0; i < match.groupCount(); i++) {
        System.out.println(match.group(i));
    }
}

The above returns (in jdk7)

2

stransformer

transformer

Oddly enough, "s(tra((nsformer)))" pattern works as intended. So does "s(tra(<inner>nsformer))" , when I refer to the match as group("inner")

What are we missing?

Group count goes from 1 to N. As per Matcher.groupCount() javadoc:

Group zero denotes the entire pattern by convention. It is not included in this count.

so the code should be:

for (int i = 1; i <= match.groupCount(); i++) {
    System.out.println(match.group(i));
}

which prints:

stransformer
transformer
nsformer

The group count start at index 1. Index 0 contains the entire pattern. From the javadoc for group(int) :

Capturing groups are indexed from left to right, starting at one. Group zero denotes the entire pattern, so the expression m.group(0) is equivalent to m.group().

So make sure the loop iterates one more step, change the < to <= for example.

Matcher.group(int) javadoc

That also explains why it works when adding an extra capturing group, the count increases and you get one of the two innermost but not the last one. Naming the capturing group works for obvious reasons (nothing was wrong, just listing them was wrong).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM