简体   繁体   中英

Java - Regex for the given string

I have the following html code segment:

        <br>
        Date: 2010-06-20,  1:37AM PDT<br>
        <br>
        Daddy: <a href="...">www.google.com</a>
        <br>

I want to extract

Date: 2010-06-20, 1:37AM PDT

and

Daddy: <a href="...">www.google.com</a>

with the help of java regex.

So what regex I should use?

This should give you a nice starting point:

    String text = 
    "        <br>\n" +
    "        Date: 2010-06-20,  1:37AM PDT<br>   \n" +
    "   <br>    \n" +
    "Daddy: <a href=\"...\">www.google.com</a>   \n" +
    "<br>";

    String[] parts = text.split("(?:\\s*<br>\\s*)+");
    for (String part : parts) {
        System.out.println("[" + part + "]");
    }

This prints ( as seen on ideone.com ):

[]
[Date: 2010-06-20,  1:37AM PDT]
[Daddy: <a href="...">www.google.com</a>]

This uses String[] String.split(String regex) . The regex pattern is "one or more of <br> , with preceding or trailing whitespaces.


Guava alternative

You can also use Splitter from Guava. It's actually a lot more readable, and can omitEmptyStrings() .

    Splitter splitter = Splitter.on("<br>").trimResults().omitEmptyStrings();
    for (String part : splitter.split(text)) {
        System.out.println("[" + part + "]");
    }

This prints:

[Date: 2010-06-20,  1:37AM PDT]
[Daddy: <a href="...">www.google.com</a>]

Related questions

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM