简体   繁体   English

python 3.7 urllib.request 不遵循重定向 URL

[英]python 3.7 urllib.request doesn't follow redirect URL

I'm using Python 3.7 with urllib.我正在使用带有 urllib 的 Python 3.7。 All work fine but it seems not to athomatically redirect when it gets an http redirect request (307).一切正常,但当它收到 http 重定向请求(307)时,它似乎没有自动重定向。

This is the error i get:这是我得到的错误:

ERROR 2020-06-15 10:25:06,968 HTTP Error 307: Temporary Redirect

I've to handle it with a try-except and manually send another request to the new Location: it works fine but i don't like it.我必须使用 try-except 来处理它并手动向新位置发送另一个请求:它工作正常,但我不喜欢它。

These is the piece of code i use to perform the request:这些是我用来执行请求的代码:

      req = urllib.request.Request(url)
      req.add_header('Authorization', auth)
      req.add_header('Content-Type','application/json; charset=utf-8')
      req.data=jdati  
      self.logger.debug(req.headers)
      self.logger.info(req.data)
      resp = urllib.request.urlopen(req)

url is an https resource and i set an header with some Authhorization info and content-type. url 是一个 https 资源,我设置了一个带有一些授权信息和内容类型的 header。 req.data is a JSON req.data 是 JSON

From urllib documentation i've understood that the redirects are authomatically performed by the the library itself, but it doesn't work for me.从 urllib 文档中,我了解到重定向是由库本身自动执行的,但它对我不起作用。 It always raises an http 307 error and doesn't follow the redirect URL.它总是引发 http 307 错误并且不遵循重定向 URL。 I've also tried to use an opener specifiyng the default redirect handler, but with the same result我还尝试使用指定默认重定向处理程序的开启程序,但结果相同

  opener = urllib.request.build_opener(urllib.request.HTTPRedirectHandler)          
  req = urllib.request.Request(url)
  req.add_header('Authorization', auth)
  req.add_header('Content-Type','application/json; charset=utf-8')
  req.data=jdati  
  resp = opener.open(req)         

What could be the problem?可能是什么问题呢?

The reason why the redirect isn't done automatically has been correctly identified by yours truly in the discussion in the comments section.您在评论部分的讨论中真正正确地确定了重定向未自动完成的原因。 Specifically, RFC 2616, Section 10.3.8 states that:具体来说, RFC 2616 第 10.3.8 节指出:

If the 307 status code is received in response to a request other than GET or HEAD, the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued.如果收到 307 状态代码以响应 GET 或 HEAD 以外的请求,除非用户可以确认,否则用户代理不得自动重定向请求,因为这可能会改变发出请求的条件。

Back to the question - given that data has been assigned, this automatically results in get_method returning POST (as per how this method was implemented ), and since that the request method is POST , and the response code is 307 , an HTTPError is raised instead as per the above specification.回到问题 - 鉴于data已被分配,这会自动导致get_method返回POST (根据此方法的实现方式),并且由于请求方法是POST ,响应代码是307 ,而是引发HTTPError按照上述规范。 In the context of Python's urllib , this specific section of the urllib.request module raises the exception.在 Python 的urllib上下文中, urllib.request模块的这个特定部分会引发异常。

For an experiment, try the following code:对于实验,请尝试以下代码:

import urllib.request
import urllib.parse


url = 'http://httpbin.org/status/307'
req = urllib.request.Request(url)
req.data = b'hello'  # comment out to not trigger manual redirect handling
try:
    resp = urllib.request.urlopen(req)
except urllib.error.HTTPError as e:
    if e.status != 307:
        raise  # not a status code that can be handled here
    redirected_url = urllib.parse.urljoin(url, e.headers['Location'])
    resp = urllib.request.urlopen(redirected_url)
    print('Redirected -> %s' % redirected_url)  # the original redirected url 
print('Response URL -> %s ' % resp.url)  # the final url

Running the code as is may produce the following按原样运行代码可能会产生以下结果

Redirected -> http://httpbin.org/redirect/1
Response URL -> http://httpbin.org/get 

Note the subsequent redirect to get was done automatically, as the subsequent request was a GET request.请注意,后续重定向到get是自动完成的,因为后续请求是GET请求。 Commenting out req.data assignment line will result in the lack of the "Redirected" output line.注释掉req.data分配行将导致缺少“重定向” output 行。

Other notable things to note in the exception handling block, e.read() may be done to retrieve the response body produced by the server as part of the HTTP 307 response (since data was posted, there might be a short entity in the response that may be processed?), and that urljoin is needed as the Location header may be a relative URL (or simply has the host missing) to the subsequent resource.在异常处理块中需要注意的其他值得注意的事情,可以e.read()来检索服务器生成的响应主体,作为HTTP 307响应的一部分(由于发布了data ,响应中可能有一个短实体可能会被处理?),并且需要urljoin ,因为Location header 可能是后续资源的相对 URL(或只是缺少主机)。

Also, as a matter of interest (and for linkage purposes), this specific question has been asked multiple times before and I am rather surprised that they never got any answers, which follows:此外,作为一个感兴趣的问题(并且出于链接目的),这个特定问题之前已被多次询问,我很惊讶他们从未得到任何答案,如下所示:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM