简体   繁体   English

Jsoup在元素内部获取元素

[英]Jsoup get element inside element

I am trying to scrape the following page: https://icobench.com/icos and I am stuck trying to extract bits of information from elements with the ico_data class. 我正在尝试抓取以下页面: https ://icobench.com/icos,我一直在尝试从具有ico_data类的元素中提取信息位。 The code looks like this: 代码如下:

<td class="ico_data">
  <div class="image_box"><a href="/ico/gcox" class="image" style="background-image:url('/images/icos/icons/gcox.jpg');"></a></div>
  <div class="content">
    <a class="name" href="/ico/gcox"><object><a href="/premium" title="Premium" class="premium">&nbsp;</a></object>GCOX</a>
    <p>GCOX is the world's first blockchain-powered platform that allows the popularity of celebrities to be tokenised and listed.<br><br><b>Restrictions KYC:</b> Yes <span class="line">|</span> <b>Whitelist:</b> Yes <span class="line">|</span> <b>Countries:</b>      USA, Singapore</p>
  </div>
  <div class="shw">
    <div class="row"><b>Start:</b> 08 Aug 2018</div>
    <div class="row"><b>End:</b> 31 Aug 2018</div>
    <div class="row"><b>Rate:</b>
      <div class="rate color4">3.9</div>
    </div>
  </div>
</td>

I'd like to extract the name, description, start date, end date. 我想提取名称,描述,开始日期,结束日期。 How do I go about it? 我该怎么办?

This is my code so far: 到目前为止,这是我的代码:

Document document = Jsoup.connect("https://icobench.com/icos").userAgent("Mozilla").get();    
Elements companyElements = document.getElementsByClass("ico_data");
for (Element companyElement : companyElements) {
   // do stuff here
}

Thanks, 谢谢,

You can Filter out the Start and End By filtering Tags with contains. 您可以通过过滤包含标签来过滤“开始”和“结束”。 Name by the class "name" and the description by the P tag inside content div. 在内容div中以“名称”类命名,并通过P标签进行描述。

public void extract(){
        try {
            Connection connection = Jsoup.connect("https://icobench.com/icos");
            Document document = connection.get();
            Elements companyElements = document.select(".ico_data");
            for (Element companyElement : companyElements) {

                if(companyElement.select(".content")!=null&&companyElement.select(".content").size()>0){

                    Element content = companyElement.select(".content").first();
                    String name = companyElement.select(".content").select(".name").text();
                    String description = companyElement.select(".content").select("p").text();
                    String start = companyElement.select("b:contains(Start)").first()
                            .parent().text().replace(companyElement.select("b:contains(Start)").first().text(),"");
                    String end = companyElement.select("b:contains(End)").first()
                            .parent().text().replace(companyElement.select("b:contains(End)").first().text(),"");

                }

                System.out.println(companyElement);
                // do stuff here
                }
        } catch (IOException e) {
            e.printStackTrace();
        }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM