简体   繁体   中英

Jsoup: get link from table element

Here is part of table:

<tr>
   <td width="30" class="woBorder"><p align="center"><font size="3" color="Green"> 3 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-01-15 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=green> 15 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-01-16 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=red> 16 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-01-17 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 17 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-01-18 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 18 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-01-19 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 19 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-01-20 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 20 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-01-21 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 21 </font></td>
   <td class="woBorder">&nbsp</td>     <td width="30" class="woBorder"><p align="center"><font size="3" color="Green"> 7 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-02-12 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 12 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-02-13 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 13 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-02-14 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 14 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-02-15 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 15 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-02-16 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 16 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-02-17 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 17 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-02-18 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 18 </font></td>
   <td class="woBorder">&nbsp</td>     <td width="30" class="woBorder"><p align="center"><font size="3" color="Green"> 11 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-03-12 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 12 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-03-13 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 13 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-03-14 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 14 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-03-15 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 15 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-03-16 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 16 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-03-17 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 17 </font></td>
          <td width="30" class="woBorder"><a href="tran_TripDetail.php?TripDate=2018-03-18 00:00:00&PHPSESSID=sessionID">                      <p align="center"><font size="1" color=black> 18 </font></td>
</tr>

and I need to get url from element with red color:

<p align="center"><font size="1" color=red> 16 </font></td>

I decided to use jsoup library, and here what I try to do:

Document document = Jsoup.connect(siteUrl).execute().parse();

Element table = document.select("table").get(2);
Elements links = table.getElementsByTag("a");
String date = table.select("*[color*='red']").first().toString();
System.out.println("Date: " + date);
for (Element link: links) {
    String url = link.attr("href");
    String text = link.text();

    System.out.println(text + ", " + url);
}

But in such way I just can get this element and all links. And I think that to get list of all urls and find needed using "date" is not the smartest thing. So could someone please advice, how I can cope with this task?

I'm assuming the HTML in the example is a typo since it is malformed (ie closing tags missing for a and p tags).

If the HTML was valid, the following code would get the url you want after selecting the table element:

Element redElement = table.select("*[color*='red']").first();

// Get the sibling (a tag) of its parent (p tag) and get the value of href.
String url = redElement.parent().previousElementSibling().attr("href");
System.out.println("URL: " + url);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM