简体   繁体   中英

How to scrape javascript webpage using python standard libs only

I have to scrape a website that uses javascript to display content. I have to use standard libs only as I will run this script on a server where there is not any browser. I have found selenium but it requires a browser that in my case is not possible to install.

Any idea or solution?

Have a look at Ghost.py http://jeanphix.me/Ghost.py/ . It doesn't require a browser.

pip install Ghost.py

from ghost import Ghost
ghost = Ghost()
page, resources = ghost.open('http://stackoverflow.com/')

You didn't mention anything about how the website is using javascript, but if it uses AJAX requests that are triggered after any kind of user interaction, you will need to use something like Selenium to automatize that behaviour. Here, you can find a short tutorial of how to scrape with Scrapy + Selenium . This of course requires a browser previously installed in your machine.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM