简体   繁体   中英

HtmlUnit - scraping data

How using HtmlUnit I can extract page which contains javascript as HTML? I found sample code as below but not working.

public class Downloader {

        public static void main(String[] args) throws Exception {
            LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog");

            java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(Level.OFF); 
            java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);

            try (final WebClient webClient = new WebClient()) {
                final HtmlPage page = webClient.getPage("https://www.oddsportal.com/matches/soccer/");
                System.out.println(page.asText());
            }
            System.out.println("END");
        }
}

With this code I landing in infinite loop. I don't know why. If I open above site in firefox inspector I can see full HTML code after executing javascript. How I can reach the same result with HtmlUnit. It is possible? Maybe I should using any other library? Any suggestions?

HtmlUnit tends to have a lot of problems with interpreting javascript. If you are just looking for the game data, you might be more successful otherwise: https://github.com/gingeleski/odds-portal-scraper

Anyways, i managed to get the code working with changing the BrowserVersion: final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_60)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM