简体   繁体   中英

How to retrieve data from data table from Sports Reference using JSoup?

I'm attempting to use JSoup to retrieve the amount of wins for a team from a Sports Reference table.

Specifically, I am trying to receive the following data point highlighted below, with the html code provided

Below is what I have tried already, but I get a null pointer exception when trying to access the text of this element, telling me that my code is likely not parsing the HTML code correctly.

Element wins = document.selectFirst("td[data-stat=\\"wins\\"]");

What I want is for the text of this element to be 34 (or some number depending on the number of wins for the team).

Check what your Document was able to read from page and print it . If it contains HTML content which can be dynamically added by JavaScript by browser, you need to use as tool Selenium not Jsoup.

For reading HTML source , you can write similar to:

import java.io.IOException;
import org.jsoup.Jsoup;

public class JSoupHTMLSourceEx {
    public static void main(String[] args) throws IOException {
        String webPage = "https://www.basketball-reference.com/teams/CHI/2020.html#all_team_misc";
        String html = Jsoup.connect(webPage).get().html();
        System.out.println(html);
    }
}

Since Jsoup supports cssSelector , you can try to get an element like:

public static void main(String[] args)  {
        String webPage = "https://www.basketball-reference.com/teams/CHI/2020.html#all_team_misc";
        String html = Jsoup.connect(webPage).get().html();

Document document = Jsoup.parse(html);
    Elements tds = document.select("#team_misc > tbody > tr:nth-child(1) > td:nth-child(2)");
        for (Element e : tds) {
            System.out.println(e.text());
        }
}

But better solution is to use Selenium - a portable framework for testing web applications ( more details about Selenium tool ):

public static void main(String[] args) {
    String baseUrl = "https://www.basketball-reference.com/teams/CHI/2020.html#all_team_misc";
    WebDriver driver = new FirefoxDriver();

    driver.get(baseUrl);
    String innerText = driver.findElement(
        By.xpath("//*[@id="team_misc"]/tbody/tr[1]/td[1]")).getText();  
        System.out.println(innerText); 
    driver.quit();
    }
}

Also you can try instead of:

driver.findElement(By.xpath("//*[@id="team_misc"]/tbody/tr[1]/td[1]")).getText(); 

in this form :

driver.findElement(By.xpath("//[@id="team_misc"]/tbody/tr[1]/td[1]")).getAttribute("innerHTML");

PS In the future it would be useful to add source links from where you want to get information or at least snippet of the DOM structure instead of image.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM