I want to understand how the below Java regular expression program worked. I am not able understand the second line in the output of the program
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d+)(.*)";
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
System.out.println("Found value: " + m.group(1) );
System.out.println("Found value: " + m.group(2) );
This produces an output like this
Found value: This order was placed for QT3000! OK?
Found value: This order was placed for QT300
Found value: 0
I understand that the pattern we are searching for in the string is a sequence that is a number ( \\d+
) with anything before (.*)
and after it (.*)
. Please correct me if I am wrong here.
I understood that m.group(0) returns the whole string. I didn't understand the second line of the output. Found value: This order was placed for QT300 . What is happening here?
It's returning the match produced from the first capturing group ( ... )
. And since *
by default is a greedy operator, it's matching everything up until the last digit in the character string.
Breaking it down:
m.group(0) → Entire match → (.*)(\\d+)(.*) // This order was placed for QT3000! OK?
m.group(1) → Capture Group 1 → (.*) // This order was placed for QT300
m.group(2) → Capture Group 2 → (\\d+) // 0
m.group(3) → Capture Group 3 → (.*) // ! OK?
This is due to both greedy (as many as possible) and docile (give back when needed) from the regex. ( Greedy... but Docile )
Hence it pretty explains the situation u got there.
This order was placed for QT300
0
! OK?
! OK?
To understand better if you change the one to unlimited (\\d+) to zero to unlimited (\\d*), the Greedy behavior from Group 1 will take it all.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.