[英]Parsing HTML Document with JSOUP, can't select table node?
I've already looked into several Stack Overflow topics with similar questions. 我已经查看了几个类似问题的Stack Overflow主题。
I have the following problem: I have a simple html page, downloaded and saved locally as .html File, I am parsing it with JSoup to read the content of that table. 我有以下问题:我有一个简单的html页面,下载并在本地保存为.html文件,我用JSoup解析它以读取该表的内容。 Unfortunately when I look for my table with .select("table") it returns me no Elements.
不幸的是,当我用.select(“table”)查找我的表时,它没有返回任何元素。 Hence I have debugged it, what I could notice is... my body node has one childnode, which appears to be solely String, and thus I assume I can't find any table node?
因此我调试了它,我能注意到的是......我的身体节点有一个子节点,它看起来只是字符串,因此我假设我找不到任何表节点?
Can anyone help me out please? 有人可以帮帮我吗?
Here is my code snippet: 这是我的代码片段:
for (Element table : doc.select("table.creditsuisse")) {
for (Element row : table.select("tr")) {
for (Element tds : row.select("td")){
for(Element link : row.select("href")){
System.out.println(link.text());
}
System.out.println(tds.text());
}
}
}
And here is how my input File looks like: 这是我的输入文件的样子:
<html>
<head>
</head>
<body>
<table class="creditsuisse" width="100%" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<th>Name</th>
<th style="width:170px;">Funktion</th>
<th style="width:180px;">
Amtsdauer (Seit) </th>
<th style="width:130px;">Alter (Geburtsdatum)</th>
<th style="width:45px;">Neuigkeit</th>
</tr>
<tr>
<td>
<a onclick="document.location='/u/p/al_thani_jassim_hamad_j_j-9293792/credit_suisse_ag_CH-020.3.923.549-1.htm'" href="/u/p/al_thani_jassim_hamad_j_j-9293792/credit_suisse_ag_CH-020.3.923.549-1.htm">Al-Thani Jassim Hamad J.J.</a> * <br>
</td>
<td>
VR-Mitglied
</td>
<td><a onclick="document.location='http://www.moneyhouse.ch/u/pub/credit_suisse_ag_CH-020.3.923.549-1.htm#28.06.2010'" href="/u/pub/credit_suisse_ag_CH-020.3.923.549-1.htm#28.06.2010">2 Jahre (28.06.2010)</a></td>
<td>-</td>
<td align="center"></td>
</tr>
<tr>
<td>
<a onclick="document.location='/u/p/albers_franz-4438178/credit_suisse_ag_CH-020.3.923.549-1.htm'" href="/u/p/albers_franz-4438178/credit_suisse_ag_CH-020.3.923.549-1.htm">Albers Franz</a> * <br>
</td>
<td>
VR-Mitglied
</td>
<td><a onclick="document.location='http://www.moneyhouse.ch/u/pub/credit_suisse_ag_CH-020.3.923.549-1.htm#04.05.1998'" href="/u/pub/credit_suisse_ag_CH-020.3.923.549-1.htm#04.05.1998">14 Jahre (04.05.1998)</a></td>
<td>-</td>
<td align="center"></td>
</tr>
</tbody>
</table>
</body>
</html>
In order to read a local file in JSoup
you would need to use the parse method that takes a File
object rather than the one that takes HTML content. 为了读取
JSoup
的本地文件,您需要使用带有File
对象的parse方法而不是带有 HTML内容的对象。 Replace 更换
Document doc = Jsoup.parse("C:\\...\\myFile.html", "UTF-8");
with 同
Document doc = Jsoup.parse(new File("C:\\...\\myFile.html"), "UTF-8");
How are you reading your document? 你是如何阅读你的文件的? If it is just a string, then you need to convert it to the "Document" first You can try something like this:
如果它只是一个字符串,那么你需要先将它转换为“文档”你可以尝试这样的事情:
Document document = Jsoup.parse(YOUR_STRING);
Elements elements = document.getElementsByTag("table");
Element table = elements[0];
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.