How to use python to scrape the text from a page generated by javascript?

Question

I'm looking for a way on Linux to write a script that scrapes the text from a page which is generated by Javascript (specifically etherpad eg http://www.board.net ). Ideally I'd like to use an existing tool but I haven't found a suitable one (eg lynx, but it doesn't support javascript, or Selenium, but it runs in a browser). Suggestions welcome.

If there's nothing suitable (which would seem surprising for such a simple need), maybe I can write something myself in Python. What useful Python classes exist for something like this?

Answer 1

One option is to still stick with Selenium , but use a headless PhantomJS .

See also:

Headless Selenium Testing with Python and PhantomJS

Example (using firefox webdriver):

from selenium import webdriver

url = 'http://board.net/p/ThisIsBob%27sBoard/timeslider'
driver = webdriver.Firefox()
driver.get(url)

element = driver.find_element_by_id('padcontent')
print element.text

prints:

Here is some text I'd like to scrape
 I wonder how to go about it?

How to use python to scrape the text from a page generated by javascript?

Question

1 answers

solution1
1 ACCPTED 2014-04-17 15:19:42

How to use python to scrape the text from a page generated by javascript?

Question

1 answers

solution1 1 ACCPTED 2014-04-17 15:19:42

solution1
1 ACCPTED 2014-04-17 15:19:42