I was looking at a code at TutorialsPoint and something has been bothering me since then... take a look at this code :
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches
{
public static void main( String args[] ){
// String to be scanned to find the pattern.
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d+)(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
while(m.find( )) {
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
System.out.println("Found value: " + m.group(3));
}
}
}
this code successfully prints :
Found value: This was placed for QT300
Found value: 0
Found value: ! OK?
but according to the regex "(.*)(\\\\d+)(.*)"
, why doesn't it return other possible outcomes such as :
Found value: This was placed for QT30
Found value: 00
Found value: ! OK?
or
Found value: This was placed for QT
Found value: 3000
Found value: ! OK?
and if this code isn't suited to do so, then how can I write one that can find all possible matches ?
It's because of the greediness of *
and there comes the backtracking .
String :
This order was placed for QT3000! OK?
Regex:
(.*)(\\d+)(.*)
We all know that .*
is greedy and matches all characters as much as possible. So the first .*
matches all the characters upto the last character that is ?
and then it backtracks in-order to provide a match. The next pattern in our regex is \\d+
, so it backtracks upto a digit. Once it finds a digit, \\d+
matches that digit because the condition is satisfied here ( \\d+
matches one or more digits ). Now the first (.*)
captures This order was placed for QT300
and the following (\\\\d+)
captures the digit 0
located just before to the !
symbol.
Now the next pattern (.*)
captures all the remaining characters that is !<space>OK?
. m.group(1)
refers to the characters which are present inside the group index 1 and m.group(2)
refers to the index 2, like that it goes on.
See the demo here .
To get your desired output.
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d{2})(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
while(m.find( )) {
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
System.out.println("Found value: " + m.group(3));
}
Output:
Found value: This order was placed for QT30
Found value: 00
Found value: ! OK?
(.*)(\\\\d{2})
, backtracks upto two digits in-order to provide a match.
Change your pattern to this,
String pattern = "(.*?)(\\d+)(.*)";
To get the output like,
Found value: This order was placed for QT
Found value: 3000
Found value: ! OK?
?
after the *
forces the *
to do a non-greedy match.
Use extra captuing groups to get the outputs from a single program.
String line = "This order was placed for QT3000! OK?";
String pattern = "((.*?)(\\d{2}))(?:(\\d{2})(.*))";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
while(m.find( )) {
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(4));
System.out.println("Found value: " + m.group(5));
System.out.println("Found value: " + m.group(2));
System.out.println("Found value: " + m.group(3) + m.group(4));
System.out.println("Found value: " + m.group(5));
}
Output:
Found value: This order was placed for QT30
Found value: 00
Found value: ! OK?
Found value: This order was placed for QT
Found value: 3000
Found value: ! OK?
(.*?)(\\d+)(.*)
Make your *
greedy quantifier non greedy by putting *?
.
Because your first group (.*)
is greedy it will capture evrything and will leave just one 0
for \\d
to capture.If you make it non greedy it will give you expected results.See demo.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.