简体   繁体   中英

HREF + TEXT with Jsoup

I've the following HTML Page:

 </div><div id="page_content_list01" class="grid_12">
 <h2><strong class="floatleft">TEXT1</strong></h2><br>
    <table>

<tbody>
    <tr>
        <th class="no_width">

<p class="floatleft">Attachments:</p>
        </th>
        <td class="link_azure">   
            <a target="_blank" href="http://www.example.com">TEXT2</a><br/>

        </td>
    </tr>
</tbody>
    </table><h2><strong class="floatleft">TEXT3</strong></h2><br>
    <table>

<tbody>
    <tr>
        <th class="no_width">

<p class="floatleft">Atachments:</p>
        </th>
        <td class="link_azure">   
            <a target="_blank" href="http://www.example2.com">TEXT4</a><br/>

        </td>
    </tr>
</tbody>
    </table><h2><strong class="floatleft">TEXT5</strong></h2><br>
    <table>

<tbody>
    <tr>

Actually I'm doing:

 Elements rows = document.select("div#page_content_list01");

Now I to select "TEXT" and link. I wanna to make clickable link, so I'm using:

  for (Element eleme : rows) {
       Elements elements = eleme.select("a");
       for (Element elem : elementi) {
            String url = elem.attr("href");
            String title = elem.text();
       }
  }

and I'm getting:

 url = "http://www.example.com";
 title = "TEXT2";

and it's ok, but in this way I can't read "TEXT1" and "TEXT3". Can someone help me please?

I think you need to work on the selecors. First, your primary selector

Elements rows = document.select("div#page_content_list01");

will return with a list of ONE element only, since you actually select the div, not the tables or table rows. I would instead do this to get all relevant info:

Elements tables = document.select("div#page_content_list01>table");
for (Element table : tables){
  Element h2 = table.previousElementSibling();
  String titleStr = h2.text();
  Element a = table.select("a").first();
  String linkStr = a.attr("href");
}

Note that the Text in the h2 elements is on the same level as the table, not inside a common div. This is why I use the previous sibling notation. Also note that I wrote this out of my head and it is untested. You should get the idea though.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM