简体   繁体   中英

java regular expression issue about capture group

public void test(){
    String source = "hello<a>goodA</a>boys can <a href=\"www.baidu.com\">goodB</a>\"\n"
                + "                + \"this can help";
    Pattern pattern = Pattern.compile("<a[\\s+.*?>|>](.*?)</a>");
    Matcher matcher = pattern.matcher(source);
    while (matcher.find()){
        System.out.println("laozhu:" + matcher.group(1));
    }
}

Output:

laozhu:goodA
laozhu:href="www.baidu.com">goodB

Why the second match is not laozhu:goodB ?

Try this Regex:

<a(?: .*?)?>(\w+)<\/a>

So your Pattern should look like this:

Pattern pattern = Pattern.compile("<a(?: .*?)?>(\\w+)<\\/a>");

It matches goodA and goodB .

For the detailed description, look here: Regex101 .

    Pattern pattern = Pattern.compile("<a.*?>(.*?)</a>");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM