简体   繁体   English

Jsoup:从表元素获取链接

[英]Jsoup: get link from table element

Here is part of table: 这是表的一部分:

<tr>
   <td width="30" class="woBorder"><p align="center"><font size="3" color="Green"> 3 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-01-15 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=green> 15 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-01-16 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=red> 16 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-01-17 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 17 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-01-18 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 18 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-01-19 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 19 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-01-20 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 20 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-01-21 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 21 </font></td>
   <td class="woBorder">&nbsp</td>     <td width="30" class="woBorder"><p align="center"><font size="3" color="Green"> 7 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-02-12 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 12 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-02-13 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 13 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-02-14 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 14 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-02-15 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 15 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-02-16 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 16 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-02-17 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 17 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-02-18 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 18 </font></td>
   <td class="woBorder">&nbsp</td>     <td width="30" class="woBorder"><p align="center"><font size="3" color="Green"> 11 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-03-12 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 12 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-03-13 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 13 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-03-14 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 14 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-03-15 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 15 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-03-16 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 16 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-03-17 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 17 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-03-18 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 18 </font></td>
</tr>

and I need to get url from element with red color: 我需要从元素中获取红色的网址:

<p align="center"><font size="1" color=red> 16 </font></td>

I decided to use jsoup library, and here what I try to do: 我决定使用jsoup库,在这里我尝试做的是:

Document document = Jsoup.connect(siteUrl).execute().parse();

Element table = document.select("table").get(2);
Elements links = table.getElementsByTag("a");
String date = table.select("*[color*='red']").first().toString();
System.out.println("Date: " + date);
for (Element link: links) {
    String url = link.attr("href");
    String text = link.text();

    System.out.println(text + ", " + url);
}

But in such way I just can get this element and all links. 但是通过这种方式,我可以获取此元素和所有链接。 And I think that to get list of all urls and find needed using "date" is not the smartest thing. 而且我认为获取所有URL列表并使用“日期”查找所需内容不是最明智的选择。 So could someone please advice, how I can cope with this task? 所以有人可以请教,我该如何应对呢?

I'm assuming the HTML in the example is a typo since it is malformed (ie closing tags missing for a and p tags). 我假设示例中的HTML是拼写错误,因为它的格式不正确(例如, ap标签缺少结束标签)。

If the HTML was valid, the following code would get the url you want after selecting the table element: 如果HTML有效,则以下代码将在选择表元素后获取所需的url:

Element redElement = table.select("*[color*='red']").first();

// Get the sibling (a tag) of its parent (p tag) and get the value of href.
String url = redElement.parent().previousElementSibling().attr("href");
System.out.println("URL: " + url);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM