[英]Java regex - overlapping matches
In the following code:在以下代码中:
public static void main(String[] args) {
List<String> allMatches = new ArrayList<String>();
Matcher m = Pattern.compile("\\d+\\D+\\d+").matcher("2abc3abc4abc5");
while (m.find()) {
allMatches.add(m.group());
}
String[] res = allMatches.toArray(new String[0]);
System.out.println(Arrays.toString(res));
}
The result is:结果是:
[2abc3, 4abc5]
I'd like it to be
我希望它是
[2abc3, 3abc4, 4abc5]
How can it be achieved?
如何实现?
Make the matcher attempt to start its next scan from the latter \\d+
.使匹配器尝试从后一个
\\d+
开始其下一次扫描。
Matcher m = Pattern.compile("\\d+\\D+(\\d+)").matcher("2abc3abc4abc5");
if (m.find()) {
do {
allMatches.add(m.group());
} while (m.find(m.start(1)));
}
Not sure if this is possible in Java, but in PCRE you could do the following:不确定这在 Java 中是否可行,但在 PCRE 中您可以执行以下操作:
(?=(\\d+\\D+\\d+)).
Explanation解释
The technique is to use a matching group in a lookahead, and then "eat" one character to move forward.该技术是在前瞻中使用匹配组,然后“吃掉”一个字符以继续前进。
(?=
: start of positive lookahead (?=
: 正向前瞻的开始
(
: start matching group 1 (
: 开始匹配组 1
\\d+
: match a digit one or more times \\d+
:匹配一个数字一次或多次\\D+
: match a non-digit character one or more times \\D+
:匹配一个非数字字符一次或多次\\d+
: match a digit one or more times \\d+
:匹配一个数字一次或多次)
: end of group 1 )
: 第 1 组结束)
: end of lookahead )
: 前瞻结束.
: match anything, this is to "move forward". Thanks to Casimir et Hippolyte it really seems to work in Java.感谢Casimir et Hippolyte,它似乎真的可以在 Java 中工作。 You just need to add backslashes and display the first capturing group:
(?=(\\\\d+\\\\D+\\\\d+)).
您只需要添加反斜杠并显示第一个捕获组:
(?=(\\\\d+\\\\D+\\\\d+)).
. . Tested on www.regexplanet.com :
在www.regexplanet.com 上测试:
The above solution of HamZa works perfectly in Java. HamZa的上述解决方案在Java中完美运行。 If you want to find a specific pattern in a text all you have to do is:
如果你想在文本中找到特定的模式,你所要做的就是:
String regex = "\d+\D+\d+";
String updatedRegex = "(?=(" + regex + ")).";
Where the regex
is the pattern you are looking for and to be overlapping you need to surround it with (?=(" at the start and ")).
regex
是您正在寻找的模式并且要重叠,您需要用(?=(" at the start and ")).
将其包围(?=(" at the start and ")).
at the end.在末尾。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.