[英]HTTP 404 status code (Not Found) shown as 302
I'm trying to retrieve the HTTP status code of a list of URLs in python, using the following piece of code: 我正在尝试使用以下代码片段来检索python中的URL列表的HTTP状态代码:
try:
r = requests.head(testpoint_url)
print(testpoint_url+" : "+str(r.status_code))
# prints the int of the status code.
except requests.ConnectionError:
print("failed to connect")
Surprisingly, for some URLs, I get 302
status code while if browsed by a browser, you see it showing a 404 code! 令人惊讶的是,对于某些URL,我得到
302
状态代码,而如果被浏览器浏览,您会看到它显示404代码!
What is going on? 到底是怎么回事? How can I get the real status code (eg 404)?
如何获取真实状态代码(例如404)?
302 is an HTTP redirection. 302是HTTP重定向。 A web browser will follow the redirect to the URL reported in the
Location
response header. Web浏览器将按照重定向到“
Location
响应标头中报告的URL。 When requesting that next URL, it will have its own response code, which can include 404. 当请求下一个URL时,它将具有自己的响应代码,其中可以包括404。
Your Python code does not follow the redirect, which would explain why it gets the original 302 instead. 您的Python代码没有遵循重定向,这将解释为什么它获得原始302的原因。
Per the Requests documentation: 根据请求文档:
Redirection and History 重定向和历史
By default Requests will perform location redirection for all verbs except HEAD .
默认情况下,Requests将对除HEAD以外的所有动词执行位置重定向。
We can use the
history
property of the Response object to track redirection.我们可以使用Response对象的
history
属性来跟踪重定向。The
Response.history
list contains theResponse
objects that were created in order to complete the request.Response.history
列表包含为完成请求而创建的Response
对象。 The list is sorted from the oldest to the most recent response.该列表按从最早到最新的响应排序。
...
...
If you're using GET, OPTIONS, POST, PUT, PATCH or DELETE, you can disable redirection handling with the
allow_redirects
parameter:如果您使用的是GET,OPTIONS,POST,PUT,PATCH或DELETE,则可以使用
allow_redirects
参数禁用重定向处理:>>> r = requests.get('https://github.com/', allow_redirects=False) >>> r.status_code 301 >>> r.history []
If you're using HEAD, you can enable redirection as well :
如果您使用的是HEAD,则还可以启用重定向 :
>>> r = requests.head('https://github.com/', allow_redirects=True) >>> r.url 'https://github.com/' >>> r.history [<Response [301]>]
So, in your code, change this: 因此,在您的代码中,更改以下内容:
r = requests.head(testpoint_url)
To this: 对此:
r = requests.head(testpoint_url, allow_redirects=True)
Then r.status_code
will be the final status code (ie, 404) after all redirects have been followed. 然后,在遵循所有重定向之后,
r.status_code
将是最终状态代码(即404)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.