简体   繁体   中英

Extract text between two (different) HTML tags using jsoup

I have following snippet of HTML code:

<td>
    <span class="detailh2" style="margin:0px">This month: </span>2 145        
    <span class="detailh2">Total: </span> 31 704          
    <span class="detailh2">Last: </span> 30.12.2021          
</td>

My goal is to extract the part of the code following the Total: span . Which means the output should look like:

31 704

I got this:

String total = doc.select("td:contains(Total:)").get(0).ownText();

, which returns:

2 145 31 704 30.12.2021

As you can see all three values are merged into one confusing string. Is there any way (method?) which will return them in array(list)?

["2 145", "31 704", "30.12.2021"]

(I don't actually need array, I'm interested in the Total value only)

Use the Element.nextSibling() method. In the example code below, the desired values are placed into a List Interface of String:

String html = "<td>\n"
            + "    <span class=\"detailh2\" style=\"margin:0px\">This month: </span>2 145 \n"
            + "    <span class=\"detailh2\">Total: </span> 31 704                         \n"
            + "    <span class=\"detailh2\">Last: </span> 30.12.2021                      \n"
            + "</td>";

List<String> valuesList = new ArrayList<>();

Document doc = Jsoup.parse(html);
Elements elements = doc.select("span");
for (Element a : elements) {
    Node node = a.nextSibling();
    valuesList.add(node.toString().trim());
}
    
// Display valuesLlist in Condole window:
for (String value : valuesList) {
    System.out.println(value);
}

It will display the following into the Console Window:

2 145
31 704
30.12.2021

If you prefer to just get the value for Total: then you can try this:

String html = "<td>\n"
            + "    <span class=\"detailh2\" style=\"margin:0px\">This month: </span>2 145 \n"
            + "    <span class=\"detailh2\">Total: </span> 31 704                         \n"
            + "    <span class=\"detailh2\">Last: </span> 30.12.2021                      \n"
            + "</td>";
String totalValue = "N/A";
Document doc = Jsoup.parse(html);
Elements elements = doc.select("span");
for (Element a : elements) {
    if (a.before("</span>").text().contains("Total:")) {
        Node node = a.nextSibling();
        totalValue = "Total: --> " + node.toString().trim();
        break;
    }
}
    
// Display the value in Condole window:
System.out.println(totalValue);

The above code will display the following within the Console Window:

 Total: --> 31 704

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM