[英]How to read an HTML table with Jsoup
I am trying to read the table with the cities from here 我正在尝试从这里与城市一起阅读桌子
Essential I want all the cities names but I am stuck at the part where i traverse to the inside of the table. 至关重要,我想要所有城市的名称,但我只能停留在桌子内部。
Select code. 选择代码。
Element table = rawCities.getElementById("content")
.getElementById("bodyContent")
.getElementById("mw-content-text")
.select("table.wikitable sortable jquery-tablesorter").first()
`.select("tbody").first()`;
So the document is downloaded and parsed with Jsoup.connect in another class and here I am trying to get the city names. 因此,该文档已下载并在另一个类中与Jsoup.connect一起解析,在这里我试图获取城市名称。 When I traverse with selects I get a NullPointerException here.
当我遍历选择时,我在这里得到NullPointerException。 If I get rid of the
.select("tbody").first()
the program runs but debugger shows table variable null. 如果我摆脱了
.select("tbody").first()
程序运行,但调试器显示表变量为null。 Should I be doing this in an other way or did I get something wrong? 我应该以其他方式这样做还是我做错了什么?
If you print rawCities
you will most probably not find any element which would represent tag <jquery-tablesorter>
. 如果您打印
rawCities
,则很可能找不到任何表示标签<jquery-tablesorter>
元素。 So you should remove it from your select
. 因此,您应该将其从
select
删除。
Another problem is that table.wikitable sortable
will try to find 另一个问题是
table.wikitable sortable
将尝试查找
<table class="wikitable">
...
<sortable>
...
</table>
not 不
<table class"wikitable sortable">...
To find element with few classes use .
要查找很少类的元素,请使用
.
operator before each class name like element.class1.class2
not space (which describes ancestor-child relationship) element.class1 class2
. 每个类名前面的运算符,例如
element.class1.class2
不要空格 (描述祖先与孩子的关系) element.class1 class2
。
So your code could be simplified to 因此您的代码可以简化为
Element table = rawCities
.select("table.wikitable.sortable tbody")
.first();
Anyway if you only want to print content of first column of selected table you can do it with 无论如何,如果您只想打印所选表的第一列的内容,则可以使用
for (Element row : rawCities.select("table.wikitable.sortable td:eq(0) a")) {
System.out.println(row.text());
}
You can use this loop to also add results of row.text()
to some List<String>
created earlier or use code like 您还可以使用此循环将
row.text()
结果添加到先前创建的某些List<String>
或使用类似以下的代码
List<String> names = rawCities
.select("table.wikitable.sortable td:eq(0) a")
.stream()
.map(e -> e.text())
.collect(Collectors.toList());
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.