简体   繁体   中英

Load javascript content inside HTML

So, I have a webpage which I want to load and get some information. So far it was going nice, I used HttpClient to load the webpage and then loaded some info using Jsoup. But here's the problem: some of the content of this webpage is only loaded via Javascript and Jsoup does not load it because it does not simulate a browser, it's just a HTML Parser.

So, I started to search for something that could do this for me and found out HtmlUnit. It's a very nice library but it does not seem to be compatible with Android as it's very painfull to get it working (some websites even say it is not compatible with Android because it uses some Swing classes). I also found Selenium's Android WebDriver, but it does not seems too good as I need to install a separated APK to make it work, which may reduce performance.

So, is there any Android-compatible library like Jsoup or HtmlUnit which can emulate a real browser OR that I can give it a String with HTML content and it would load the Javascripts inside of it?

Thanks in advance and sorry for any english error. Cheers.

Plainly said, nothing can emulate a browser but a browser itself.

Any library you can find will probably be fit for specific purposes, maybe evaluating simple scripts, but I don't think it's possible to find a generic solution: the million ways modern webpages work can include hundreds of internal and external JS libaries, DOM manipulation, asynchronous requests ... you in fact need a full browser to make it work rather than a small library.

So if you are looking for a generic solution, I think the way to go might be to use a WebView -that is indeed a full HTML5 browser-, load the webpage you want to extract inside and try to extract the data yourself with all the interaction possibilities that WebView gives you. Mind that you can have an invisible WebView where you'll try to extract the info.

So check out the docs on WebView . There are tons of functions you can use and override to control how it works: You can set hooks that will be called when the page tries to load scripts, css, files, intercept calls, substitute data, call javascript from Android, get parts of the webpage as text or images ...

http://developer.android.com/reference/android/webkit/WebView.html

take a look at functions: evaluateJavaScript , loadData , and the WebChromeClient you can set with setWebChromeClient

http://developer.android.com/reference/android/webkit/WebChromeClient.html

This object has a ton of functions that you can use to intercept whatever is happening in the loaded page such as onJsalert, onJsTimeout, onReceivedTitle,

Check out also WebView's setWebViewClient , that allows you to set a WebClient that also provides a lot of hooks like onPageFinished , onPageStarted , onReceiveError ..

http://developer.android.com/reference/android/webkit/WebViewClient.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM