简体   繁体   English

Java正则表达式 - 重叠匹配

[英]Java regex - overlapping matches

In the following code:在以下代码中:

public static void main(String[] args) {
    List<String> allMatches = new ArrayList<String>();
    Matcher m = Pattern.compile("\\d+\\D+\\d+").matcher("2abc3abc4abc5");
    while (m.find()) {
        allMatches.add(m.group());
    }

    String[] res = allMatches.toArray(new String[0]);
    System.out.println(Arrays.toString(res));
}

The result is:结果是:

[2abc3, 4abc5]

I'd like it to be我希望它是

[2abc3, 3abc4, 4abc5]

How can it be achieved?如何实现?

Make the matcher attempt to start its next scan from the latter \\d+ .使匹配器尝试从后一个\\d+开始其下一次扫描。

Matcher m = Pattern.compile("\\d+\\D+(\\d+)").matcher("2abc3abc4abc5");
if (m.find()) {
    do {
        allMatches.add(m.group());
    } while (m.find(m.start(1)));
}

Not sure if this is possible in Java, but in PCRE you could do the following:不确定这在 Java 中是否可行,但在 PCRE 中您可以执行以下操作:
(?=(\\d+\\D+\\d+)).

Explanation解释
The technique is to use a matching group in a lookahead, and then "eat" one character to move forward.该技术是在前瞻中使用匹配组,然后“吃掉”一个字符以继续前进。

  • (?= : start of positive lookahead (?= : 正向前瞻的开始
    • ( : start matching group 1 ( : 开始匹配组 1
      • \\d+ : match a digit one or more times \\d+ :匹配一个数字一次或多次
      • \\D+ : match a non-digit character one or more times \\D+ :匹配一个非数字字符一次或多次
      • \\d+ : match a digit one or more times \\d+ :匹配一个数字一次或多次
    • ) : end of group 1 ) : 第 1 组结束
  • ) : end of lookahead ) : 前瞻结束
  • . : match anything, this is to "move forward". : 匹配任何东西,这是为了“前进”。

Online demo在线演示


Thanks to Casimir et Hippolyte it really seems to work in Java.感谢Casimir et Hippolyte,它似乎真的可以在 Java 中工作。 You just need to add backslashes and display the first capturing group: (?=(\\\\d+\\\\D+\\\\d+)).您只需要添加反斜杠并显示第一个捕获组: (?=(\\\\d+\\\\D+\\\\d+)). . . Tested on www.regexplanet.com :www.regexplanet.com 上测试:

在此处输入图片说明

The above solution of HamZa works perfectly in Java. HamZa的上述解决方案在Java中完美运行。 If you want to find a specific pattern in a text all you have to do is:如果你想在文本中找到特定的模式,你所要做的就是:

String regex = "\d+\D+\d+";

String updatedRegex = "(?=(" + regex + ")).";

Where the regex is the pattern you are looking for and to be overlapping you need to surround it with (?=(" at the start and ")). regex是您正在寻找的模式并且要重叠,您需要用(?=(" at the start and ")).将其包围(?=(" at the start and ")). at the end.在末尾。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM