简体   繁体   中英

find last occurrence using java and regular expression

I'm receiving HTML code from XML and trying to find last in Java. when I'm running the code I always receive the first span and the groupCount show me that there is only one match (the first one). I also tried to use a hardcode version of the XML (I created a string variable and still got the same result)

here is my code:

String text = "<div><ul ><li><span>answer 1.</span></li><li><span>answer 2</span></li><li><span>answer3.</span></li><li><span>answer 4</span></li></ul><div><span>Cat 1 | Cat 2 | Cat 3</span></div></div>"
    Pattern pattern3 = Pattern.compile("<span.*?(?=</span>)");
    Matcher matcher3 = pattern3.matcher(desc);
    if (matcher3.find()) {
        int result = matcher3.groupCount();
        String s = (matcher3.group(result))//->>always show the first result 
    }

Thanks in advanced

You can call matcher.find again and it wll find the next match. It does not only tell you whether it found anything, it also actively searches for the next match. After you called it once, you only get the first match. When you call it again, next time you call matcher.group you get the second match, and so on. Repeat until it finds nothing, than take the last result. Also, you do not really need groupCount , as there is always the same number of groups in the match -- zero.

String text = "<div><ul ><li><span>answer 1.</span></li><li><span>answer 2</span></li><li><span>answer3.</span></li><li><span>answer 4</span></li></ul><div><span>Cat 1 | Cat 2 | Cat 3</span></div></div>";
Pattern pattern3 = Pattern.compile("<span.*?(?=</span>)");
Matcher matcher3 = pattern3.matcher(text);
String s = null;
while (matcher3.find()) {
    s = matcher3.group();
} 
System.out.println(s);

Output is <span>Cat 1 | Cat 2 | Cat 3 <span>Cat 1 | Cat 2 | Cat 3 <span>Cat 1 | Cat 2 | Cat 3 .

If you want to use just what's within the <span> tags, you can use regex "<span>(.*?)</span>" and matcher3.group(1) to get what's within the first pair of () (or put the tags in lookahead and lookbehind, but IMHO it's easier this way).

try this:

String text = "<div><ul ><li><span>answer 1.</span></li><li><span>answer 2</span></li><li><span>answer3.</span></li><li><span>answer 4</span></li></ul><div><span>Cat 1 | Cat 2 | Cat 3</span></div></div>"
    Pattern pattern3 = Pattern.compile("<span.*?(?=</span>)");
    Matcher matcher3 = pattern3.matcher(text);
    if (matcher3.find()) {
        String in= matcher3.group(matcher3.groupCount()); 
    }

Though you have asked for finding last occurence using regex. But also consider using jsoup which is java tested library for html parser. Its already tested and good from readability point of view

See finding last occurrence using jsoup

Use a greedy quantifier * with . to find the last occurence.

(?s)^.*<span[^>]*>(.*?)</span>

Captures of first group matcher3.group(1) regexplanet demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM