简体   繁体   中英

Extracting values from a table row based on row data - JSoup

I'm attempting to parse a HTML document using JSoup. What I am trying to do is extract the table data of a specific row. I want to be able to select said row using the value of the href attribute or the value of the <a></a> tags.

<tbody>
   <tr class="even">
      <td><a href="link-1">Link_1</a></td>
      <td align="center">9</td>
      <td align="center">9</td>
      <td align="center">2</td>
   </tr>
   <tr class="odd">
      <td><a href="link-2">Link_2</a></td>
      <td align="center">22</td>
      <td align="center">4</td>
      <td align="center">1</td>
   </tr>
   <tr class="even">
      <td><a href="link-3">Link_3</a></td>
      <td align="center">22</td>
      <td align="center">7</td>
      <td align="center">1</td>
   </tr>
</tbody>

Selecting the whole table is easy, I can just use the following:

Document htmlRawData = Jsoup.parse(deviceMetricData.toString());
Elements htmlMetrics = htmlRawData.select("tbody > tr > td[align]");

htmlMetrics.stream().forEach((ele) -> {
   System.out.println(ele.toString());
}); 

This is only ever ideal when the table has a single row. If it has many then selecting a specific row based on the value of the first cell becomes more tricky.

Can anyone help get me started or point me in the right direction?

Remember that can traverse through DOM tree.

If you only know that there will be always the same structure ( a inside td which is inside tr ) then you can make it as follows:

Element link = document.select("tbody > tr > td > a[href=\"link-1\"]").first();
link.parent().parent().children().forEach(System.out::println);

You can also filter all rows by occurence of this very href value:

final Elements rows = document.select("tbody > tr");
rows
    .stream()
    .filter(tr -> !tr.getElementsByAttributeValueMatching("href", "link-1").isEmpty())
    .findFirst()
    .map(Element::children)
    .ifPresent(System.out::println);

Or by using select:

final Elements rows = document.select("tbody > tr");
rows
    .stream()
    .filter(tr -> !tr.select("a[href=\"link-1\"").isEmpty())
    .findFirst()
    .map(Element::children)
    .ifPresent(System.out::println);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM