简体   繁体   English

urllib2中的数据与Safari的Web检查器中的数据不同

[英]Different data in urllib2 than in Safari's Web Inspector

I looked here and here for information on my issue, but with no luck. 在这里这里寻找有关我的问题的信息,但是没有运气。

I made some python code that is intended to grab a webpage's source, as in Safari's Web Inspector. 我制作了一些旨在抓取网页源代码的python代码,例如Safari的Web Inspector。 However, I have been getting different code from my application and Safari's Web Inspector. 但是,我从我的应用程序和Safari的Web Inspector获得了不同的代码。 Here is my code so far: 到目前为止,这是我的代码:

#!/usr/bin/python

import urllib2

# headers

hdr = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.28.10 (KHTML, like Gecko) Version/6.0.3 Safari/536.28.10',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Cache-Control': 'max-age=0'}

# request data

req = urllib2.Request("https://www.google.com/#q=rainbow&safe=active", headers=hdr)

# try to get data
try:
    page = urllib2.urlopen(req)
    print page.info()
except urllib2.HTTPError, e:
    print e.fp.read()


content = page.read()

#print content

print content 

And the headers match up to what is in Web Inspector: 标题与Web Inspector中的内容匹配:

网页检查器


The code returned is different, though, for a google search for "rainbow". 但是,对于谷歌搜索“彩虹”,返回的代码是不同的。

My python: 我的python:

http://paste.ubuntu.com/6270549/ http://paste.ubuntu.com/6270549/

Web Inspector: 网页检查器:

http://paste.ubuntu.com/6270606/ http://paste.ubuntu.com/6270606/

As far as I know, it seems that my code is missing a large number of the ubiquitous }catch(e){gbar_._DumpException(e)} lines that are present in the Web Inspector code. 据我所知,我的代码似乎缺少Web检查器代码中存在的大量普遍存在的}catch(e){gbar_._DumpException(e)}行。 Also, my code only has 78 lines, while the Web Inspector code has 235 lines. 另外,我的代码只有78行,而Web Inspector代码只有235行。 Does this mean that my code is not getting all of the javascript or some other portion of the webpage? 这是否意味着我的代码未获取所有JavaScript或网页的其他部分? How can I get my code retrieve the same data as the Web Inspector? 如何使我的代码检索与Web检查器相同的数据?

You are using the wrong link to search with google search- the correct link should be: 您使用错误的链接来搜索Google搜索-正确的链接应为:

https://www.google.com/search?q=rainbow&safe=active

instead of: 代替:

https://www.google.com/#q=rainbow&safe=active

The second link will cause a redirect to Google's homepage when used in python, because it is incorrect (for some reason) when not used in Safari. 第二个链接在python中使用时将导致重定向到Google主页,因为在Safari中不使用时(由于某种原因)它不正确。 This is why the code is different. 这就是代码不同的原因。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM