简体   繁体   中英

How do I scrape website HTML after Javascript has run?

So I was trying to scrape a website. When I scrape it, it turns out that the result isn't the same as when you try to right click and view page source on Mozilla or Google Chrome.

The code I used:

import urllib

page = urllib.urlopen("http://www.google.com/search?q=python") 
#or any other website that uses search
python = page.read()
print python

It turns out that the code only takes the 'raw' web page, which isn't what I wanted. For websites like this, I want the code after javascript etc. has run. So that the result is the same as if you were right clicking and viewing source code from your browser.

Is there any other way of doing this?

its not exactly a raw page as it is an error page from google to you : in the print python part it says on the top of the message :

Your client does not have permission to get URL /search?q=python from this server.

if you were to change your page variable to

page = urllib.urlopen("http://volt.al/")

you'd see the javascript.

try it out with different pages to see what you like

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM