So I was trying to scrape a website. When I scrape it, it turns out that the result isn't the same as when you try to right click and view page source on Mozilla or Google Chrome.
The code I used:
import urllib
page = urllib.urlopen("http://www.google.com/search?q=python")
#or any other website that uses search
python = page.read()
print python
It turns out that the code only takes the 'raw' web page, which isn't what I wanted. For websites like this, I want the code after javascript etc. has run. So that the result is the same as if you were right clicking and viewing source code from your browser.
Is there any other way of doing this?
its not exactly a raw page as it is an error page from google to you : in the print python
part it says on the top of the message :
Your client does not have permission to get URL
/search?q=python
from this server.
if you were to change your page
variable to
page = urllib.urlopen("http://volt.al/")
you'd see the javascript.
try it out with different pages to see what you like
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.