简体   繁体   English

Jsoup,解析html表

[英]Jsoup, parse html table

This is probably dumb question, but I can't really figure it out. 这可能是愚蠢的问题,但我无法真正弄清楚。 I'm trying to parse html output from page: http://meteo.uwb.edu.pl/ 我正在尝试解析以下页面的html输出: http : //meteo.uwb.edu.pl/

So basically I need to extract values from table, from left side (blue text) as keys(headers) and from right side (brown text) as values. 所以基本上我需要从表中提取值,从左侧(蓝色文本)提取键(标题),从右侧(棕色文本)提取值。 Additionaly, header labels ("Aktualna pogoda/Weather conditions: ") 另外,标题标签(“ Aktualna pogoda /天气状况:”)

My intention is to get html table from html output and then parse its row, but I can't figure it out, because html output is rather complicated. 我的意图是从html输出中获取html表,然后解析其行,但我无法弄清楚,因为html输出相当复杂。 I'm starting from it: 我从它开始:

doc = Jsoup.connect("http://meteo.uwb.edu.pl/").get();
Elements tables = doc.select("table");
for (Element row : table.select("tr"))
{
  Elements tds = row.select("td:not([rowspan])");
  System.out.println(tds.get(0).text() + "->" + tds.get(1).text());
}

But still my result is a mess. 但是我的结果仍然是一团糟。 Do you have any ideas how to parse it correctly? 您有任何想法如何正确解析吗?

keys data from first table can be retrieved by this code: 可以通过以下代码检索第一个表中的键数据:

doc.select("table").get(1).select("tbody").get(1).select("tr").get(1).select("td").get(0).select("b")

and value by this: 和价值:

doc.select("table").get(1).select("tbody").get(1).select("tr").get(1).select("td").get(1).select("b")

for second table 第二张桌子

doc.select("table").get(2).select("tbody").get(0).select("tr").get(1).select("td").get(0).select("b")

and

doc.select("table").get(2).select("tbody").get(0).select("tr").get(1).select("td").get(1).select("b")

I managed it this way: 我这样管理:

 doc = Jsoup.connect("http://meteo.uwb.edu.pl/").get();
 Elements tables = doc.select("td");
 Elements headers = tables.get(2).select("b");
 Elements vals = tables.get(3).select("b");
 Map all = new HashMap();

 for (int i=0;i<headers.size() ; i++) all.put(headers.get(i).text(),vals.get(i).text());

It seems to be ok. 好像还可以

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM