简体   繁体   中英

Reluctant Qualifiers in Pattern Matching

I have following programm

public class PatternMatching {
            public static void main(String[] args) {
                String pattern ="a??";
                Pattern pattern1 = Pattern.compile(pattern);
                String findAgainst = "a";
                Matcher matcher = pattern1.matcher(findAgainst);
                int count=0;
                while(matcher.find()){
                    count++;
                    System.out.println(matcher.group(0)+".start="+ matcher.start()+".end="+matcher.end());
                }
                System.out.println(count);
            }
        }

which prints following output

.start=0.end=0
.start=1.end=1
2

instead of

.start=0.end=0
a.start=0.end=1
.start=1.end=1
3

when I run the program with pattern "b??" the output is

.start=0.end=0
.start=1.end=1
2

which is correct. What would be the reason for incorrect output eventhough it is a reluctant qualifier?

From what I see, the issue is that Java regex engine uses the following algorithm when encountering a zero-length match: it compares the index of the match to the current regex index, and if they coincide, the regex index is incremented.

Thus, when you matched the empty space before a with a?? the regex engine found a zero-length match and incremented the index that appeared after a , thus, skipping a correct match.

If you use a greedy version - a? - the output will be different:

a.start=0.end=1
.start=1.end=1
2

It happens because the first a was consumed, the regex engine index is after a , and can now match the end-of-string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM