[英]jSoup get data using td-class tags from webpage
I would like to get data from http://www.futbol24.com/Live/?__igp=1&LiveDate=20141104 using jSoup. 我想使用jSoup从http://www.futbol24.com/Live/?__igp=1&LiveDate=20141104获取数据。 I know how to use jSoup - but I am finding it difficult to pinpoint the data that I need. 我知道如何使用jSoup-但我发现很难查明所需的数据。
I would like the Time, Home Team and Away Team from each row of the tbody table. 我希望tbody表格的每一行都有时间,主队和客队。 So the output from the first row should be: 因此,第一行的输出应为:
08:30 Persipura Jayapura Pelita Bandung Raya
I can see the td class of each of these elements as "status alt", "home" and "guest". 我可以将每个元素的td类视为“状态alt”,“家庭”和“来宾”。
Currently I have tried the below, but it doesn't seem to output anything... what am I doing wrong? 目前,我已经尝试了以下方法,但似乎没有输出任何内容...我在做什么错?
matches = new ArrayList<Match>();
//getHistory
String website = "http://www.futbol24.com/Live/?__igp=1&LiveDate=20141104";
Document doc = Jsoup.connect(website).get();
Element tblHeader = doc.select("tbody").first();
List<Match> data = new ArrayList<>();
for (Element element1 : tblHeader.children()){
Match match = new Match();
match.setTimeOfMatch(element1.select("td.status.alt").text());
match.setAwayTeam(element1.select("td.home").text());
match.setHomeTeam(element1.select("td.guest").text());
data.add(match);
System.out.println(data.toString());
Does anybody know how I can use jSoup to get these elements from each row of the table? 有人知道如何使用jSoup从表的每一行获取这些元素吗?
Thanks, 谢谢,
Rob 抢
The content of this site is generated via AJAX it seems. 该站点的内容似乎是通过AJAX生成的。 Jsoup can't handle this, since it is not a browser that interprets JavaScript. Jsoup无法处理此问题,因为它不是解释JavaScript的浏览器。 To solve this scraping problem you may need something like Selenium webdriver . 要解决此抓取问题,您可能需要Selenium webdriver之类的东西。 I gave a longer answer to a generalized question about this before, so please look here: 在此之前,我对这个一般性问题给出了更长的答案,所以请在这里查看:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.