简体   繁体   中英

Extracting Table Data using JSoup

I'm trying to extract financial information from a table using JSoup. I've reviewed similar questions and can get their examples to work (here are two:

Using Jsoup to extract data

Using JSoup To Extract HTML Table Contents ).

I'm not sure why the code doesn't work on my URL .

Below are 3 different attempts. Any help would be appreciated.

String s = "http://financials.morningstar.com/valuation/price-ratio.html?t=AXP&region=usa&culture=en-US";

//Attempt 1
try {
    Document doc = Jsoup.connect("http://financials.morningstar.com/valuation/price-ratio.html?t=AXP&region=USA&culture=en_US").get();

    for (Element table : doc.select("table#currentValuationTable.r_table1.text2")) {
        for (Element row : table.select("tr")) {
            Elements tds = row.select("td");
            if (tds.size() > 6) {
                System.out.println(tds.get(0).text() + ":" + tds.get(1).text());
            }
        }
    }
} 
catch (IOException ex) {
    ex.printStackTrace();
}
// Attempt 2
try {
    Document doc = Jsoup.connect(s).get(); 
    for (Element table : doc.select("table#currentValuationTable.r_table1.text2")) {
        for (Element row : table.select("tr")) {
            Elements tds = row.select("td");
            for (int i = 0; i < tds.size(); i++) {
                System.out.println(tds.get(i).text());
            }
        }
    }        
} 
catch (IOException ex) {
    ex.printStackTrace();
}
//Attempt 3
try {
    Document doc = Jsoup.connect(s).get(); 
    Elements tableElements = doc.select("table#currentValuationTable.r_table1.text2");

    Elements tableRowElements = tableElements.select(":not(thead) tr");

    for (int i = 0; i < tableRowElements.size(); i++) {
        Element row = tableRowElements.get(i);
        System.out.println("row");
        Elements rowItems = row.select("td");
        for (int j = 0; j < rowItems.size(); j++) {
            System.out.println(rowItems.get(j).text());
        }
    }        
} 
catch (IOException ex) {
    ex.printStackTrace();
}

Answer provided by Psherno:

Print what Document was able to read from page (use System.out.println(doc); ). Something tells me that your problem may be related with fact that HTML content you are looking for is dynamically added by JavaScript by browser, which Jsoup can't do since it doesn't have JavaScript support. In that case you should use more powerful tool like web driver (like Selenium).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM