简体   繁体   中英

Trouble extracting a table from a website with jsoup

I'm working on a project that involves extracting a table from a particular site that has several HTML tables. Here's an image highlighting in a red box the specific table I want to extract:

Image

And my code:

String html = "https://finance.yahoo.com/quote/GOOG/analysts?p=GOOG";
try {
    Document doc = Jsoup.connect(html).get();
    Element tableElements = doc.select("table").get(7);

    for (Element row : tableElements.select("tr")) {
        Elements tds = row.select("td");
        for (int j = 0; j < tds.size(); j++) {
            System.out.println(tds.get(j).text());
        }
    }
} catch (IOException e) {
    e.printStackTrace();
}

However this code returns an index out of bounds error when selecting the table. Lowering the index will pull one of the other tables from the page, and I'm uncertain how else to select the particular table I want.

The table in question is loaded asynchronously via AJAX. This is why you get an index out of bounds exception. The table is simply not in the DOM upon loading the initial URL. You should analyze the loading of the page using the browser developer tools and find the AJAX call that loads the data you need. An alternative way of getting to the info you seek is by using a different technology like selenium webdriver to load the content. Selenium webdiver will execute JavaScript so it will load and render the full page including all AJAX loaded content. Good luck.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM