简体   繁体   English

使用jsoup获取表的内容

[英]Using jsoup to get contents of a table

Am trying to use jsoup to grap the contents of a URL (table:- contents) into an arraylist. 我正在尝试使用jsoup将URL的内容(表:-内容)捕获到arraylist中。 So far, have hit dead ends when searching online for questions similar to mine. 到目前为止,在网上搜索类似于我的问题时已经陷入僵局。 Maybe a fresh eyes will help. 也许新鲜的眼睛会有所帮助。 This is what I have so far which is not much.I read somewhere that I need to identify the table id then work using Elements to loop through the tag for each row, if true how? 到目前为止,这还不算什么。我读了某个地方,我需要识别表ID,然后使用Elements遍历每一行的标记,如果为true,该如何做?

try {               
    Document doc = Jsoup.connect("http://www.us-proxy.org").userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36").get();
    //utilize the fetched html    
} catch(Exception e{  
    e.printStackTrace();   
} 

Here is my output source of the URL html site(relevant part) :- 这是URL HTML站点(相关部分)的输出源:-

<table cellpadding="0" cellspacing="0" border="0" class="display fpltable" id="proxylisttable">
  <thead>
    <tr>
      <th>IP Address</th>
      <th>Port</th>
      <th>Code</th>
      <th>Country</th>
      <th>Anonymity</th>
      <th>Google</th>
      <th>Https</th>
      <th>Last Checked</th>
    </tr>
  </thead>
  <tbody>
    <tr><td>24.210.34.226</td><td>3128</td><td>US</td><td>United States</td><td>transparent</td><td>no</td><td>no</td><td>18 hours 20 minutes ago</td></tr>
    <tr><td>50.76.49.97</td><td>4444</td><td>US</td><td>United States</td><td>transparent</td><td>no</td><td>no</td><td>18 hours 20 minutes ago</td></tr>
    <tr><td>
  </tbody>
  <tfoot>
    <tr>
      <th class="input"><input type="text" /></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </tfoot>
</table>

My desired output should look something like this proxy : 50.76.49.97 port:4444 country: United States type: Transparent ..... 我想要的输出应类似于以下代理:50.76.49.97端口:4444国家/地区:美国类型:透明.....

Any help? 有什么帮助吗?

    Elements elements = doc.select("table[class=display fpltable]");

    Elements rows = elements.get(0).select("tr");

    for (Element row : rows) {

        if (row.select("td").size() == 8) {
            String iPAddress = row.select("td").get(0).text();
            String port = row.select("td").get(1).text();
            String code = row.select("td").get(2).text();
            String country = row.select("td").get(3).text();
            String anonymity = row.select("td").get(4).text();
            String google = row.select("td").get(5).text();
            String https = row.select("td").get(6).text();
            String lastChecked = row.select("td").get(7).text();
        }

    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM