简体   繁体   中英

How to get specific sub-elements of html data using Jsoup

So I am trying to get all prices from a Html file using Jsoup. The simplified Html is structured something like this:

//some html

<div class="price-point-wrap use-roundtrippricing">
    <div class="price-point-wrap-top use-roundtrippricing">


    <div class="pp-from-total use-roundtrippricing">Roundtrip</div>
    </div>
    <div class="price-point price-point-revised use-roundtrippricing">
        $509
    </div>

    <div class="fare-select-button-div">
        <input type="button" aria-describedby="sr_product_ECONOMY_123-745|1975-UA" value="Select" class="fare-select-button">
        <span class="visuallyhidden">fare for Economy  (lowest)</span>
    </div>

</div>

//some html

 <div class="price-point-wrap use-roundtrippricing">
    <div class="price-point-wrap-top use-roundtrippricing">


    <div class="pp-from-total use-roundtrippricing">Roundtrip</div>
    </div>
    <div class="price-point price-point-revised use-roundtrippricing">
        $1,046
    </div>

    <div class="fare-select-button-div">
        <input type="button" aria-describedby="sr_product_MIN-BUSINESS-OR-FIRST_123-745|1975-UA" value="Select" class="fare-select-button">
        <span class="visuallyhidden">fare for First  (2-cabin, lowest)</span>
    </div>

    <div class="pp-remaining-seats">​5 tickets left at this price​</div>
</div>

//some html

This is what I have tried so far:

File input = new File("Flights.html");
Document document = Jsoup.parse(input, "UTF-8", "");
Elements prices = document.getElementsByClass("price-point");
for(Element e: prices){
    System.out.println(e.toString());
}

This gives me the following result:

<div class="price-point price-point-revised use-roundtrippricing">
    $509
</div>
<div class="price-point price-point-revised use-roundtrippricing">
    $1,046
</div>
.....

But now I only want prices like:

509
1046

I tried regex by only keeping the digits e.toString().replaceAll("\\D+","") when printing it, this seems to work but that is not how I want to achieve it. How can I get only the numbers using Jsoup?

Thanks to the comment from @Eritrean, I needed to use e.text() instead of e.toString() which gave me

$509 
$1,046

I still need to use regex like e.replaceAll("[$,]", "") to get rid of the dollar signs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM