简体   繁体   中英

Java Regex for first matching String

I have a string hhht . I need to grep 12121212 & 56565656 from the string. What i tried so far is shown below.

String hhht = "dhdhdh<a:Rakesh>12121212</a:Rakesh>sdsdvsdvsvvsv"+"sfsf"+"<a:Rakesh>56565656</a:Rakesh>zvnbjvbj";

Pattern pattern    = Pattern.compile("<a:Rakesh>(.+)</a:Rakesh>");
Matcher matcher    = pattern.matcher(hhht);

for(int hh = 0 ;hh <matcher.groupCount(); hh++){
    if(matcher.find())
        System.out.println(matcher.group(hh+1));

}

I got the output as,

12121212</a:Rakesh>sdsdvsdvsvvsvsfsf<a:Rakesh>56565656

ie, the pattern is matching to the first <a:Rakesh> tag.

  1. Use non-greedy regex with DOTALL flag:

     Pattern pattern = Pattern.compile("(?s)<a:Rakesh>(.+?)</a:Rakesh>"); 
  2. And you cannot get matcher.groupCount() before calling find

Use it like this

if(matcher.find()) {
   for(int hh = 0; hh <= matcher.groupCount(); hh++){
        System.out.println(matcher.group(hh+1));    
   }
}

You have a greedy matcher which is not limited to matching numbers. This means that it will match as much as possible . Since you have two matching tags it grabs every character between the opening of the first tag and the closing of the second tag.

You can make it non greedy (it will then stop as early as possible, stopping at the first </a:Rakesh> ) or make it only match numbers (which will not match </a:Rakesh> , stopping at that point).

This matches only numbers:

"<a:Rakesh>(\\d+)</a:Rakesh>"

This is the non greedy approach:

"<a:Rakesh>(.+?)</a:Rakesh>"

This depends on greeding matching: Take this pattern:

Pattern pattern    = Pattern.compile("<a:Rakesh>(.+?)</a:Rakesh>");

For more information look this thread .

And you should use a while loop:

    while (matcher.find()) {
            System.out.println(matcher.group(1));

    }       

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM