简体   繁体   English

从urllib2切换到请求,具有相同参数的结果却奇怪地不同

[英]Switching from urllib2 to requests, strangely different results with the same parameters

I'm trying to grab a cookie from a POST request. 我正在尝试从POST请求中获取Cookie。 Previously, I used urllib2, which still works fine but I wanted to switch to the clearer library python-requests. 以前,我使用urllib2,它仍然可以正常工作,但是我想切换到更清晰的python-requests库。 Unfortunately I get an error on the page. 不幸的是,我在页面上看到一个错误。

Since the request is HTTPS I can't sniff them to locate the difference. 由于请求是HTTPS,因此无法嗅探它们以查找差异。

urllib2 code: urllib2代码:

NINTENDO_LOGIN_PAGE = "https://id.nintendo.net/oauth/authorize/"
MIIVERSE_CALLBACK_URL = "https://miiverse.nintendo.net/auth/callback"
parameters = {'client_id': 'ead88d8d450f40ada5682060a8885ec0',
              'response_type': 'code',
              'redirect_uri': MIIVERSE_CALLBACK_URL,
              'username': MIIVERSE_USERNAME,
              'password': miiverse_password}

data = urlencode(parameters)
self.logger.debug(data)
req = urllib2.Request(NINTENDO_LOGIN_PAGE, data)
page = urllib2.urlopen(req).read()
self.logger.debug(page)

Result (good): 结果(好):

[...]
<div id="main-body">
    <div id="try-miiverse">
        <p class="try-miiverse-catch">A glimpse at some of the posts that are currently popular on Miiverse.</p>
        <h2 class="headline">Miiverse Sampler</h2>
        <div id="slide-post-container" class="list post-list">
        [...]

Requests code: 请求代码:

req = requests.post(NINTENDO_LOGIN_PAGE, data=parameters)
self.logger.debug(req.text)

Result (bad): 结果(差):

[...]
<div id="main-body">
    <h2 class="headline">Activity Feed</h2>

    <div class="activity-feed content-loading-window">
        <div>
            <img src="https://d13ph7xrk1ee39.cloudfront.net/img/loading-image-green.gif" alt=""></img>
            <p class="tleft"><span>Loading activity feed...</span></p>
        </div>
    </div>
    <div class="activity-feed content-load-error-window none"><div>
    <p>The activity feed could not be loaded. Check your Internet connection, wait a moment and then try reloading.</p>
    <div class="buttons-content"><a href="/" class="button">Reload</a></div>
    </div>
</div>
[...]

Thanks in advance for any hints towards solving this. 在此先感谢您提供解决此问题的任何提示。

Update 1: Thank you all for your responses! 更新1:谢谢大家的答复!

As suggested by @abarnert, I checked the redirects. 如@abarnert所建议,我检查了重定向。

resp = urllib2.urlopen(req)
print(resp.geturl()) # https://miiverse.nintendo.net/

req = requests.post(NINTENDO_LOGIN_PAGE, data=parameters)
print(req.url) # https://miiverse.nintendo.net/
print(req.history) # (<Response [303]>, <Response [302]>)

It seems they did both follow a redirect, but ended up in the same place. 似乎他们都遵循了重定向,但最终都在同一个地方。

@sigmavirus24, very useful website, thank you for making me discover it. @ sigmavirus24,非常有用的网站,谢谢您让我发现它。 Here are the results (I edited the order of parameters so they are easily comparable): 结果如下(我编辑了参数的顺序,以便可以轻松比较):

urllib2: urllib2:

{
  "args": {},
  "data": "",
  "files": {},
  "form": {
    "client_id": "ead88d8d450f40ada5682060a8885ec0",
    "response_type": "code",
    "redirect_uri": "https://miiverse.nintendo.net/auth/callback",
    "username": "Wiwiweb",
    "password": "password"
  },
  "headers": {
    "Accept-Encoding": "identity",
    "Connection": "close",
    "Content-Length": "170",
    "Content-Type": "application/x-www-form-urlencoded",
    "Host": "httpbin.org",
    "User-Agent": "Python-urllib/2.7"
  },
  "json": null,      
  "origin": "24.85.129.188",
  "url": "http://httpbin.org/post"
}

requests: 要求:

{
  "args": {},
  "data": "",
  "files": {},
  "form": {
    "client_id": "ead88d8d450f40ada5682060a8885ec0",
    "response_type": "code",
    "redirect_uri": "https://miiverse.nintendo.net/auth/callback",
    "username": "Wiwiweb",
    "password": "password"
  },
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate, compress",
    "Connection": "close",
    "Content-Length": "170",
    "Content-Type": "application/x-www-form-urlencoded"
    "Host": "httpbin.org",
    "User-Agent": "python-requests/1.2.3 CPython/2.7.5 Windows/7"
  },
  "json": null,
  "origin": "24.85.129.188"
  "url": "http://httpbin.org/post",
}

Looks like some headers are slightly different. 看起来有些标题略有不同。 I don't have any other idea so I might as well try to completely copy the urllib2 headers. 我没有其他想法,所以我不妨尝试完全复制urllib2标头。 Spoofing the user agent might be it. 欺骗用户代理可能就是这样。

Update 2: I have added these headers to the "requests" request: 更新2:我已将这些标头添加到“请求”请求:

headers = {'User-Agent': 'Python-urllib/2.7',
           'Accept-Encoding': 'identity'}

I am still getting the same results... The only difference between the requests now is the "requests" one has an extra header: "Accept": "*/*" . 我仍然得到相同的结果...现在,请求之间的唯一区别是“请求”一个额外的标头: "Accept": "*/*" I'm not sure this is the problem. 我不确定这是问题所在。

Could it be coming from the redirect? 可能来自重定向吗?

Well, I didn't quite solve "why" the redirects are different, but I found out where to get my cookie using requests. 好吧,我并没有完全解决“为什么”重定向不同的问题,但是我发现了使用请求从哪里获取我的cookie。

I figured the difference between the two libraries had something to do with the way they handle redirects. 我发现这两个库之间的差异与它们处理重定向的方式有关。 So I checked the history of both requests. 因此,我检查了两个请求的历史记录。 For 'requests' that's as easy as doing req.history , but for urllib2, I used this bit of code: 对于“请求”,就像执行req.history一样容易,但是对于urllib2,我使用了以下代码:

class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        print("New request:")
        print(headers)
        return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)

opener = urllib2.build_opener(MyHTTPRedirectHandler, urllib2.HTTPCookieProcessor())
urllib2.install_opener(opener)

Checking the history allowed me to see that the 'requests' request had the 'set-cookie' header during it's first redirect (so the second request out of three), but not at the end. 查看历史记录后,我发现“请求”请求在第一次重定向时具有“ set-cookie”标头(因此,第二个请求中有三个),但在最后没有。 That's good enough for me because I know where to get it now: req.history[1].cookies['ms'] 这对我来说已经足够了,因为我知道现在从哪里获得: req.history[1].cookies['ms']

As a curious note, because of that bit that I added to the urllib2 request, it started returning the same thing as the 'requests' request! 奇怪的是,由于我添加到urllib2请求中,因此它开始返回与“请求”请求相同的内容! Even changing it to that: 甚至将其更改为:

class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    pass

opener = urllib2.build_opener(MyHTTPRedirectHandler, urllib2.HTTPCookieProcessor())
urllib2.install_opener(opener)

is enough to make it completely change its response to the same thing 'requests' returned all along (That bit I marked as 'bad' in the question). 足以使其完全改变其对始终返回的“请求”的响应(这一点我在问题中被标记为“不好”)。

I'm stumped, but knowing where to find the cookie is good enough for me. 我很困惑,但是知道在哪里可以找到Cookie对我来说已经足够了。 Maybe someone curious and bored will be interested in trying to find the cause. 也许好奇和无聊的人会对尝试找到原因感兴趣。

Thank you all for your help :) 谢谢大家的帮助 :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM