[英]unable to extract contents from html using jsoup in java?
我正在尝试使用jsoup从以下<td>标记中的以下HTML代码中提取内容,这些标记具有类css-sched-table-title和css-sched-waypoints。 但是我不明白有人出了什么问题可以帮忙吗?
Document doc = Jsoup.parse("somelink.html");
Elements row = doc.select(".css-sched-table-title td");
Iterator<Element> iterator = row.listIterator();
while(iterator.hasNext())
{
Element element = iterator.next();
String value = element.text();
System.out.println("value : " + value);
}
。
<tr>
<td ALIGN="CENTER" COLSPAN="16" CLASS="css-sched-table-title"><b>Saturday - </b><b>Afternoon</b></td>
</tr>
<tr VALIGN="BOTTOM">
<TD> </TD>
<TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Townline and Southern</TD>
<TD> </TD>
<TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Clearbrook and Blueridge</TD>
<TD> </TD>
<TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Clearbrook and South Fraser</TD>
<TD> </TD>
<TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Ar. Bourquin Exchange</TD>
<TD> </TD>
<TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Lv. Bourquin Exchange</TD>
<TD> </TD>
<TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Downtown Abbotsford</TD>
<TD> </TD>
<TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">McMillan and Old Yale</TD>
<TD> </TD>
<TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Sandy Hill and Old Clayburn</TD>
</tr>
只有一个带有css-sched-table-title
td
标签,但是带有css-sched-waypoints
的列表。
同样,为了对齐正确的语法,它应该是Elements row = doc.select("td.css-sched-waypoints");
请参考这里 。
注:使用的html
文件是无效的,并且jsoup
不会将其解释为有效的表html内容。 我不得不将上面的内容放在<table></table>
标记内。
当我用您的html
文件尝试以下代码时:
Elements row = doc.select("td.css-sched-waypoints");
Element title = doc.select("td.css-sched-table-title").first();
System.out.println(title.text());
Iterator<Element> iterator = row.listIterator();
while (iterator.hasNext()) {
Element element = iterator.next();
String id = element.attr("id");
String classes = element.attr("class");
String value = element.text();
System.out.println("Id : " + id + ", classes : " + classes
+ ", value : " + value);
}
我明白了
Saturday - Afternoon
Id : , classes : css-sched-waypoints, value : Townline and Southern
Id : , classes : css-sched-waypoints, value : Clearbrook and Blueridge
Id : , classes : css-sched-waypoints, value : Clearbrook and South Fraser
Id : , classes : css-sched-waypoints, value : Ar. Bourquin Exchange
Id : , classes : css-sched-waypoints, value : Lv. Bourquin Exchange
Id : , classes : css-sched-waypoints, value : Downtown Abbotsford
Id : , classes : css-sched-waypoints, value : McMillan and Old Yale
Id : , classes : css-sched-waypoints, value : Sandy Hill and Old Clayburn
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.