简体   繁体   中英

Java Regex pattern matching first occurrence of “boundary” after any character sequence

I want to set a pattern which will find a capture group limited by the first occurrence of the “boundary”. But now the last boundary is used.

Eg:

String text = "this should match from A to the first B and not 2nd B, got that?";
Pattern ptrn = Pattern.compile("\\b(A.*B)\\b");
Matcher mtchr = ptrn.matcher(text);
while(mtchr.find()) {
    String match = mtchr.group();
    System.out.println("Match = <" + match + ">");
}

prints:

"Match = <A to the first B and not 2nd B>"

and I want it to print:

"Match = <A to the first B>"

What do I need to change within the pattern?

Make your * non-greedy / reluctant using *? :

Pattern ptrn = Pattern.compile("\\b(A.*?B)\\b");

By default, the pattern will behave greedily, and match as many characters as possible to satisfy the pattern, that is, up until the last B .

See Reluctant Quantifiers from the docs , and this tutorial .

不要使用贪婪表达式进行匹配,即:

Pattern ptrn = Pattern.compile("\\b(A.*?B)\\b");

* is greedy quantifier that matches as many characters as possible to satisfy the pattern. Up to the last B occurrence in your example. That is why you need to use reluctant one: *? , that will mach as few characters as possible. So, your pattern should be slightly changed:

Pattern ptrn = Pattern.compile("\\b(A.*?B)\\b");

See “reluctant quantifiers” in the docs , and this tutorial .

也许比让*不情愿/懒惰更明确的是说你正在寻找A,然后是一堆不是B的东西,接着是B:

Pattern ptrn = Pattern.compile("\\b(A[^B]*B)\\b");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM