简体   繁体   中英

HtmlUnit doesn't work on javascript processing

I have tried almost all the methods mentioned in Stackoverflow, but none of them worked...

I'm trying to scraping following page using HtmlUnit: http://www.nseindia.com/corporates/offerdocument/past_issue_document.htm

Just an empty page returned. It should be caused by javascript issue. I tried following tricks in HtmlUnit: waitForBackgroundJavaScript, refresh, redirect, sleep, enable javascript, click(true, true, true), etc. None of them worked...

Any suggestion:

my code:

String url = "http://www.nseindia.com/corporates/offerdocument/past_issue_document.htm";
WebClient webClient = new WebClient(BrowserVersion.INTERNET_EXPLORER_8);
webClient.setJavaScriptEnabled(true);
HtmlPage page = (HtmlPage) webClient.getPage(url);
this.getWebClient().waitForBackgroundJavaScriptStartingBefore(5000);
System.out.println(page.asXml());

Thanks a lot!

I had once similar issues. I workarounded it by using a firefox dev plugin, which logs all the requests the javascript page does. Then I emulated those requests directly from HtmlUnit (just grep the requests from the request log, paste them and inject sessionid misc parameters which are usually easy to identify. Especially useful when dealing with sites using a lot of ajax stuff.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM