简体   繁体   中英

extracting paragraph before each html table using jsoup

I have a requirement where i have to extract paragraph just before each table along with the table content form a website.

i am able to extract table data using jsoup easily but not able to extract paragraph that is occuring exactly before a table. i tried following things:-

1. doc.select("p") but its giving extra values because some text in table columns are also in <p> tag.
2. getElementsByTag  but no luck.

sample table:

<p>
<a id="table heading" name="table name"></a>
<b>Sports equipments</b>
</p>
 <table width="98%" cellpadding="0" border="1">
 <tbody>
 <tr valign="top" bgcolor="#ffffcc" align="left">
<th width="25%" scope="col">Company</th>
<th width="25%" scope="col">Product</th>
<th width="20%" scope="col">Availability</th>
<th width="55%" scope="col">Related Information</th>
 <th width="20%" scope="col">
</tr>
<tr>
<td width="18%" valign="top" rowspan="2">
<div>
Nike
<br>
1-800-545-8800
<br>
<br>
<br>
</div>
</td>
<td width="10%" valign="top">
<div>sports kit</div>
</td>
<td width="15%" valign="top" rowspan="2">
<div>Available</div>
</td>
<td width="24%" valign="top" rowspan="2">
<div>Product is available and shipping.</div>
</td>
<td width="16%" valign="top" rowspan="2">Demand increase.</td>
<td width="12%" valign="top" rowspan="2">
<div>
<div>3/26/2014</div>
</td>
</tr>
</table>

i have to extract:

<b>Sports equipments</b> 

along with the table content

You can extend your selector to this: "p > b" .

Since i don't have your full html, it's hard to say if it will work there, but for your example it does:

    final String html = ... // the html of your example
    Document doc = Jsoup.parse(html);

    /*
     * Selects b-tags, that are direct childs of p-tags.
     */
    for( Element element : doc.select("p > b") )
    {
        System.out.println(element);
    }

This prints:

<b>Sports equipments</b>
Document doc = Jsoup.connect(html).get();
Elements table = doc.select("table”);
for (int i = 0; i < table.size(); i++) {
  Element tablevalue = table.get(i);
  Element para = tablevalue.previousElementSibling();
  System.out.println(para.text());
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM