简体   繁体   中英

Java Regular Expression Matcher doesn't find all possible matches

I was looking at a code at TutorialsPoint and something has been bothering me since then... take a look at this code :

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches
{
    public static void main( String args[] ){

      // String to be scanned to find the pattern.
      String line = "This order was placed for QT3000! OK?";
      String pattern = "(.*)(\\d+)(.*)";

      // Create a Pattern object
      Pattern r = Pattern.compile(pattern);

      // Now create matcher object.
      Matcher m = r.matcher(line);
      while(m.find( )) {
         System.out.println("Found value: " + m.group(1));
         System.out.println("Found value: " + m.group(2));
         System.out.println("Found value: " + m.group(3));
      }
   }
}

this code successfully prints :

Found value: This was placed for QT300 
Found value: 0
Found value: ! OK?

but according to the regex "(.*)(\\\\d+)(.*)" , why doesn't it return other possible outcomes such as :

Found value: This was placed for QT30 
Found value: 00
Found value: ! OK?

or

Found value: This was placed for QT 
Found value: 3000
Found value: ! OK?

and if this code isn't suited to do so, then how can I write one that can find all possible matches ?

It's because of the greediness of * and there comes the backtracking .

String :

This order was placed for QT3000! OK?

Regex:

(.*)(\\d+)(.*)

We all know that .* is greedy and matches all characters as much as possible. So the first .* matches all the characters upto the last character that is ? and then it backtracks in-order to provide a match. The next pattern in our regex is \\d+ , so it backtracks upto a digit. Once it finds a digit, \\d+ matches that digit because the condition is satisfied here ( \\d+ matches one or more digits ). Now the first (.*) captures This order was placed for QT300 and the following (\\\\d+) captures the digit 0 located just before to the ! symbol.

Now the next pattern (.*) captures all the remaining characters that is !<space>OK? . m.group(1) refers to the characters which are present inside the group index 1 and m.group(2) refers to the index 2, like that it goes on.

See the demo here .

To get your desired output.

String line = "This order was placed for QT3000! OK?";
  String pattern = "(.*)(\\d{2})(.*)";

  // Create a Pattern object
  Pattern r = Pattern.compile(pattern);

  // Now create matcher object.
  Matcher m = r.matcher(line);
  while(m.find( )) {
     System.out.println("Found value: " + m.group(1));
     System.out.println("Found value: " + m.group(2));
     System.out.println("Found value: " + m.group(3));
  }

Output:

Found value: This order was placed for QT30
Found value: 00
Found value: ! OK?

(.*)(\\\\d{2}) , backtracks upto two digits in-order to provide a match.

Change your pattern to this,

String pattern = "(.*?)(\\d+)(.*)";

To get the output like,

Found value: This order was placed for QT
Found value: 3000
Found value: ! OK?

? after the * forces the * to do a non-greedy match.

Use extra captuing groups to get the outputs from a single program.

String line = "This order was placed for QT3000! OK?";
String pattern = "((.*?)(\\d{2}))(?:(\\d{2})(.*))";
Pattern r = Pattern.compile(pattern);
      Matcher m = r.matcher(line);
      while(m.find( )) {
         System.out.println("Found value: " + m.group(1));
         System.out.println("Found value: " + m.group(4));
         System.out.println("Found value: " + m.group(5));
         System.out.println("Found value: " + m.group(2));
         System.out.println("Found value: " + m.group(3) + m.group(4));
         System.out.println("Found value: " + m.group(5));
     }

Output:

Found value: This order was placed for QT30
Found value: 00
Found value: ! OK?
Found value: This order was placed for QT
Found value: 3000
Found value: ! OK?
(.*?)(\\d+)(.*)

Make your * greedy quantifier non greedy by putting *? .

Because your first group (.*) is greedy it will capture evrything and will leave just one 0 for \\d to capture.If you make it non greedy it will give you expected results.See demo.

https://regex101.com/r/tX2bH4/53

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM