简体   繁体   中英

How can I read h3 and after text element with Jsoup?

I want to read <h3> and text between <h3> so I want create a json model like title: text,text,text for h3 and text without ad.

{
  "title": "text,text,text",
  "title": "text",
  "title": "text",
  ...
}

How can I do it in this case with Java or Kotlin?

<div class="biri" id="biri">
    <h1>Yoksa Birisi mi itti?</h1>
    <h3>Title</h3>Text,
    <br>Text,
    <br>Text.
    <h3>Title:</h3>Text
    <h3>Title:</h3>Text
    <div class="ad">
        <div style="max-width:336px;">
            <script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
            <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-7180771993103993" data-ad-slot="2897611612" data-ad-format="auto"></ins>
            <script>
                (adsbygoogle = window.adsbygoogle || []).push({});
            </script>
        </div>
    </div>
    <h3>Title</h3>Text:
    <b>Text:</b> (Text
    <br>
</div>

You can get all h3 tags by using Document.select() :

Document doc = Jsoup.parse(html);
List<String> h3s = doc.select("h3").stream()
        .map(Element::text)
        .collect(Collectors.toList());

This extracts the content of all h3 tags and collects the content of them. The result is this:

[Title, Title:, Title:, Title]

Beside that the JSON, you want to create is not valid, because the keys in an JSON object have to be unique, so you can not have multiple h3 keys.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM