如何使用 JSoup 从 Sports Reference 的数据表中检索数据？

Question

我正在尝试使用 JSoup 从 Sports Reference 表中检索团队的获胜次数。

具体来说，我试图接收下面突出显示的以下数据点，并提供了 html 代码

下面是我已经尝试过的内容，但是在尝试访问此元素的文本时出现空指针异常，这告诉我我的代码可能没有正确解析 HTML 代码。

Element wins = document.selectFirst("td[data-stat=\\"wins\\"]");

我想要的是这个元素的文本是 34（或一些数字，取决于团队的获胜次数）。

Answer 1

检查您的文档能够从页面读取的内容并打印出来。 如果它包含可由浏览器通过 JavaScript 动态添加的 HTML 内容，则需要使用 Selenium 而不是 Jsoup 作为工具。

对于阅读 HTML 源代码，您可以编写类似于：

import java.io.IOException;
import org.jsoup.Jsoup;

public class JSoupHTMLSourceEx {
    public static void main(String[] args) throws IOException {
        String webPage = "https://www.basketball-reference.com/teams/CHI/2020.html#all_team_misc";
        String html = Jsoup.connect(webPage).get().html();
        System.out.println(html);
    }
}

由于 Jsoup 支持cssSelector ，您可以尝试获取如下元素：

public static void main(String[] args)  {
        String webPage = "https://www.basketball-reference.com/teams/CHI/2020.html#all_team_misc";
        String html = Jsoup.connect(webPage).get().html();

Document document = Jsoup.parse(html);
    Elements tds = document.select("#team_misc > tbody > tr:nth-child(1) > td:nth-child(2)");
        for (Element e : tds) {
            System.out.println(e.text());
        }
}

但更好的解决方案是使用Selenium - 一个用于测试 Web 应用程序的可移植框架（有关 Selenium 工具的更多详细信息）：

public static void main(String[] args) {
    String baseUrl = "https://www.basketball-reference.com/teams/CHI/2020.html#all_team_misc";
    WebDriver driver = new FirefoxDriver();

    driver.get(baseUrl);
    String innerText = driver.findElement(
        By.xpath("//*[@id="team_misc"]/tbody/tr[1]/td[1]")).getText();  
        System.out.println(innerText); 
    driver.quit();
    }
}

您也可以尝试代替：

driver.findElement(By.xpath("//*[@id="team_misc"]/tbody/tr[1]/td[1]")).getText();

以这种形式：

driver.findElement(By.xpath("//[@id="team_misc"]/tbody/tr[1]/td[1]")).getAttribute("innerHTML");

PS 将来，添加源链接会很有用，您可以从中获取信息或至少是 DOM 结构的片段而不是图像。

如何使用 JSoup 从 Sports Reference 的数据表中检索数据？

问题描述

1 个解决方案

解决方案1
1 2020-01-11 23:40:10

如何使用 JSoup 从 Sports Reference 的数据表中检索数据？

问题描述

1 个解决方案

解决方案1 1 2020-01-11 23:40:10

解决方案1
1 2020-01-11 23:40:10