简体   繁体   中英

Parse a page (partly generated by JavaScript) by using Selenium

I've got a problem: I want to parse a page (eg this one ) to collect information about the offered apps and save these information into a database.

Moreover I am using crawler4j for visiting every (available) page. But the problem - as I can see - is, that crawler4j needs links to follow in the source code.

But in this case the hrefs are generated by some JavaScript code so that crawler4j does not get new links to visit / pages to crawl.

So my idea was to use Selenium so that I can inspect several Elements like in a real Browser like Chrome or Firefox (I'm quite new with this).

But, to be honest, I don't know how to get the "generated" HTML instead of the source code.

Can anybody help me?

To inspect elements, you do not need the Selenium IDE, just use Firefox with the Firebug extension. Also, with the developer tools add on you can view a page's source and also the generated source (this is mainly for PHP).

Crawler4J can not handle javascript like this. It is better left for another more advanced crawling library. See this response here:

Web Crawling (Ajax/JavaScript enabled pages) using java

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM