简体   繁体   English

使用jsoup无法从网站中提取表

[英]Trouble extracting a table from a website with jsoup

I'm working on a project that involves extracting a table from a particular site that has several HTML tables. 我正在从事一个涉及从具有几个HTML表格的特定站点中提取表格的项目。 Here's an image highlighting in a red box the specific table I want to extract: 这是一个在红色框中突出显示我要提取的特定表的图像:

Image 图片

And my code: 而我的代码:

String html = "https://finance.yahoo.com/quote/GOOG/analysts?p=GOOG";
try {
    Document doc = Jsoup.connect(html).get();
    Element tableElements = doc.select("table").get(7);

    for (Element row : tableElements.select("tr")) {
        Elements tds = row.select("td");
        for (int j = 0; j < tds.size(); j++) {
            System.out.println(tds.get(j).text());
        }
    }
} catch (IOException e) {
    e.printStackTrace();
}

However this code returns an index out of bounds error when selecting the table. 但是,此代码在选择表时返回索引超出范围错误。 Lowering the index will pull one of the other tables from the page, and I'm uncertain how else to select the particular table I want. 降低索引将从页面中拉出其他表之一,我不确定如何选择我想要的特定表。

The table in question is loaded asynchronously via AJAX. 该表通过AJAX异步加载。 This is why you get an index out of bounds exception. 这就是为什么您获得索引超出范围异常的原因。 The table is simply not in the DOM upon loading the initial URL. 加载初始URL时,该表根本不在DOM中。 You should analyze the loading of the page using the browser developer tools and find the AJAX call that loads the data you need. 您应该使用浏览器开发人员工具分析页面的加载,并找到用于加载所需数据的AJAX调用。 An alternative way of getting to the info you seek is by using a different technology like selenium webdriver to load the content. 获取所需信息的另一种方法是使用硒网络驱动程序之类的不同技术来加载内容。 Selenium webdiver will execute JavaScript so it will load and render the full page including all AJAX loaded content. Selenium Webdiver将执行JavaScript,以便它将加载并呈现包括所有AJAX加载内容的整个页面。 Good luck. 祝好运。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM