簡體   English   中英

Jsoup:從表元素獲取鏈接

[英]Jsoup: get link from table element

這是表的一部分:

<tr>
   <td width="30" class="woBorder"><p align="center"><font size="3" color="Green"> 3 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-01-15 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=green> 15 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-01-16 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=red> 16 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-01-17 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 17 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-01-18 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 18 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-01-19 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 19 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-01-20 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 20 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-01-21 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 21 </font></td>
   <td class="woBorder">&nbsp</td>     <td width="30" class="woBorder"><p align="center"><font size="3" color="Green"> 7 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-02-12 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 12 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-02-13 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 13 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-02-14 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 14 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-02-15 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 15 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-02-16 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 16 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-02-17 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 17 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-02-18 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 18 </font></td>
   <td class="woBorder">&nbsp</td>     <td width="30" class="woBorder"><p align="center"><font size="3" color="Green"> 11 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-03-12 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 12 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-03-13 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 13 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-03-14 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 14 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-03-15 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 15 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-03-16 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 16 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-03-17 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 17 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-03-18 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 18 </font></td>
</tr>

我需要從元素中獲取紅色的網址:

<p align="center"><font size="1" color=red> 16 </font></td>

我決定使用jsoup庫,在這里我嘗試做的是:

Document document = Jsoup.connect(siteUrl).execute().parse();

Element table = document.select("table").get(2);
Elements links = table.getElementsByTag("a");
String date = table.select("*[color*='red']").first().toString();
System.out.println("Date: " + date);
for (Element link: links) {
    String url = link.attr("href");
    String text = link.text();

    System.out.println(text + ", " + url);
}

但是通過這種方式,我可以獲取此元素和所有鏈接。 而且我認為獲取所有URL列表並使用“日期”查找所需內容不是最明智的選擇。 所以有人可以請教,我該如何應對呢?

我假設示例中的HTML是拼寫錯誤,因為它的格式不正確(例如, ap標簽缺少結束標簽)。

如果HTML有效,則以下代碼將在選擇表元素后獲取所需的url:

Element redElement = table.select("*[color*='red']").first();

// Get the sibling (a tag) of its parent (p tag) and get the value of href.
String url = redElement.parent().previousElementSibling().attr("href");
System.out.println("URL: " + url);

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM