I am trying to process a large amount of data for a research project. I have a number of html files on my computer and I need to read some information into a java program.
I use Jsoup to load the document.
Unfortunately the table in the html has no class or id (and there are multiple tables). I have searched stack, but all answers I find use table.class.
How could I get the data (18/01/2014) from the table below? The doc.select is not working now, because of the missing class I think
I am trying something like this:
Element table = doc.select("table").first();
Iterator<Element> ite = table.select("td").iterator();
ite.next();
System.out.println("Value 1: " + ite.next().text());
System.out.println("Value 2: " + ite.next().text());
System.out.println("Value 3: " + ite.next().text());
System.out.println("Value 4: " + ite.next().text());
<table border=0 cellpadding=0 cellspacing=0 width=650 height=18><tr><td class="header" style="color:#FFFFFF;"><table border=0 cellpadding=0 cellspacing=0><tr>
<td><img src="/images/title_ultratop.png"></td><td style="color:#FFFFFF;vertical-align:middle;"><b>50 DANCE<br>
<a href="link"><img src="/images/arr_bw.png" border=0 style="margin-bottom:1px;margin-right:3px;"></a>18/01/2014
</b></td></tr></table>
-- EDIT
I found the table was inside another table. Using this code I could get it, BUT I only get 1 line now. Just the table, I need to get one element out of it still.
Element table = doc.select("table table").first();
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
System.out.println(tds.get(0).text());
}
I guess I am displaying an entire table now. How to get the let's say 2nd element?
There are some problems in your html. I suppose the correct one is:
<table border="1" cellpadding="0" cellspacing="0" width="650" height="18">
<tr>
<td class="header" style="color:#FFFFFF;">
<table border="1" cellpadding="0" cellspacing="0">
<tr>
<td><img src="/images/title_ultratop.png"></td>
<td style="color:#FFFFFF;vertical-align:middle;">
<b>50 DANCE
<br>
<a href="link"><img src="/images/arr_bw.png" border="0"
style="margin-bottom:1px;margin-right:3px;"></a>
18/01/2014
</b>
</td>
</tr>
</table>
</td>
</tr>
</table>
In order to get that node you have to select: table table td b and then get the 4th child node (a text node):
Elements td = doc.select("table table td b");
TextNode el = (TextNode)td.first().childNode(4);
System.out.println(el.text());
Right,a third embedded table and it works.
Element table = doc.select("table table").first();
Still need to select a different table on the site as well. I read about table:contains(word). Hope that will word!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.