简体繁体中英

Parse a page (partly generated by JavaScript) by using Selenium

原文 2014-08-28 12:37:25 5 1 java/ javascript/ selenium/ web-scraping/ crawler4j

I've got a problem: I want to parse a page (eg this one ) to collect information about the offered apps and save these information into a database.

Moreover I am using crawler4j for visiting every (available) page. But the problem - as I can see - is, that crawler4j needs links to follow in the source code.

But in this case the hrefs are generated by some JavaScript code so that crawler4j does not get new links to visit / pages to crawl.

So my idea was to use Selenium so that I can inspect several Elements like in a real Browser like Chrome or Firefox (I'm quite new with this).

But, to be honest, I don't know how to get the "generated" HTML instead of the source code.

Can anybody help me?

1 answers

To inspect elements, you do not need the Selenium IDE, just use Firefox with the Firebug extension. Also, with the developer tools add on you can view a page's source and also the generated source (this is mainly for PHP).

Crawler4J can not handle javascript like this. It is better left for another more advanced crawling library. See this response here:

Web Crawling (Ajax/JavaScript enabled pages) using java

parse HTML javascript to JAVA using selenium

Clicking a javascript generated link in selenium

Parse response of javascript that was executed in Selenium

Selenium - Page Wait for Angular using Protractor javascript code

How to wait for page to load completely using JavaScript in Selenium

Use Selenium in Java to click on field generated by Javascript

selenium - clicking on radio button generated by javascript

Read a XML file which is generated on a website and parse it with selenium

Asserting a Page Using selenium

Testing Javascript Using Selenium

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question parse HTML javascript to JAVA using selenium Clicking a javascript generated link in selenium Parse response of javascript that was executed in Selenium Selenium - Page Wait for Angular using Protractor javascript code How to wait for page to load completely using JavaScript in Selenium Use Selenium in Java to click on field generated by Javascript selenium - clicking on radio button generated by javascript Read a XML file which is generated on a website and parse it with selenium Asserting a Page Using selenium Testing Javascript Using Selenium

Related Tags

Parse a page (partly generated by JavaScript) by using Selenium

Question

1 answers

solution1 0 2014-09-15 21:01:35

solution1
0 2014-09-15 21:01:35