简体   繁体   中英

Why this code don't work properly?

Why this code:

String keyword = "pattern";
String text    = "sometextpatternsometext";
String patternStr = "^.*" + keyword + ".*$"; //
Pattern pattern = Pattern.compile(patternStr, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
    int start = matcher.start();
    int end = matcher.end();
    System.out.println("start = " + start + ", end = " + end);
}

start = 0, end = 23

don't work properly.

But, this code:

String keyword = "pattern";
String text    = "sometext pattern sometext";
String patternStr = "\\b" + keyword + "\\b"; //
Pattern pattern = Pattern.compile(patternStr, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
    int start = matcher.start();
    int end = matcher.end();
    System.out.println("start = " + start + ", end = " + end);
}

start = 9, end = 16

work fine.

It does work. Your pattern

^.*pattern.*$

says to match:

  1. start at the beginning
  2. accept any number of characters
  3. followed by the string pattern
  4. followed by any number of characters
  5. until the end of the string

The result is the entire input string. If you wanted to find only the word pattern , then the regex would be just the word by itself, or as you found, bracketed with word-boundary metacharacters.

It is not that the first example didn't work, it is that you inadvertently asked it to match more than you meant.

The .* expressions expand to contain all the characters before "pattern" and all the characters after pattern, so the whole expression matches the whole line.

With your second example, you only specify that it match a blank space before and after "pattern" so the expression matches mostly pattern, plus a couple of spaces.

The problem is in your regex: "^.*" + keyword + ".*$"

The expression .* matches as many characters as there are in the string. It means that it actually matches whole string. After the whole string it cannot find your keyword.

To make it working you have to make it greedy, ie add question sign after .* :

"^.*?" + keyword + ".*$"

This time .*? matches minimum characters followed by your keyword.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM