简体   繁体   English

Java嵌套的正则表达式组无法捕获内部组

[英]Java nested regular expression groups not capturing inner groups

Why is this regex not capturing the innermost group? 为什么此正则表达式未捕获最里面的组?

final String regex = "s(tra(nsformer))";
final Pattern pattern = Pattern.compile(regex);
final Matcher match = pattern.matcher("stransformer");

if (match.matches()) {
    System.out.println(match.groupCount());
    for (int i = 0; i < match.groupCount(); i++) {
        System.out.println(match.group(i));
    }
}

The above returns (in jdk7) 上面的返回值(在jdk7中)

2 2

stransformer 变形金刚

transformer 变压器

Oddly enough, "s(tra((nsformer)))" pattern works as intended. 奇怪的是, "s(tra((nsformer)))"模式可以正常工作。 So does "s(tra(<inner>nsformer))" , when I refer to the match as group("inner") 当我将匹配项称为group("inner")时, "s(tra(<inner>nsformer))"也是如此

What are we missing? 我们缺少什么?

Group count goes from 1 to N. As per Matcher.groupCount() javadoc: 组计数从1到N。根据Matcher.groupCount() javadoc:

Group zero denotes the entire pattern by convention. 零组按照惯例表示整个模式。 It is not included in this count. 它不包括在此计数中。

so the code should be: 因此代码应为:

for (int i = 1; i <= match.groupCount(); i++) {
    System.out.println(match.group(i));
}

which prints: 打印:

stransformer
transformer
nsformer

The group count start at index 1. Index 0 contains the entire pattern. 组计数从索引1开始。索引0包含整个模式。 From the javadoc for group(int) : 从javadoc的group(int)

Capturing groups are indexed from left to right, starting at one. 捕获组从左到右从一个索引开始。 Group zero denotes the entire pattern, so the expression m.group(0) is equivalent to m.group(). 组零表示整个模式,因此表达式m.group(0)等效于m.group()。

So make sure the loop iterates one more step, change the < to <= for example. 所以确保循环迭代一个步骤,改变<<=例如。

Matcher.group(int) javadoc Matcher.group(int)javadoc

That also explains why it works when adding an extra capturing group, the count increases and you get one of the two innermost but not the last one. 这也解释了为什么在添加额外的捕获组时它起作用,计数增加,而您却获得了两个最里面的一个,而不是最后一个。 Naming the capturing group works for obvious reasons (nothing was wrong, just listing them was wrong). 给捕获组命名是有明显原因的(没有错,只列出它们是错的)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM