简体   繁体   English

HTTP 404状态代码(未找到)显示为302

[英]HTTP 404 status code (Not Found) shown as 302

I'm trying to retrieve the HTTP status code of a list of URLs in python, using the following piece of code: 我正在尝试使用以下代码片段来检索python中的URL列表的HTTP状态代码:

try:
    r = requests.head(testpoint_url)
    print(testpoint_url+" : "+str(r.status_code))
    # prints the int of the status code.
except requests.ConnectionError:
    print("failed to connect")

Surprisingly, for some URLs, I get 302 status code while if browsed by a browser, you see it showing a 404 code! 令人惊讶的是,对于某些URL,我得到302状态代码,而如果被浏览器浏览,您会看到它显示404代码!

在此处输入图片说明

What is going on? 到底是怎么回事? How can I get the real status code (eg 404)? 如何获取真实状态代码(例如404)?

302 is an HTTP redirection. 302是HTTP重定向。 A web browser will follow the redirect to the URL reported in the Location response header. Web浏览器将按照重定向到“ Location响应标头中报告的URL。 When requesting that next URL, it will have its own response code, which can include 404. 当请求下一个URL时,它将具有自己的响应代码,其中可以包括404。

Your Python code does not follow the redirect, which would explain why it gets the original 302 instead. 您的Python代码没有遵循重定向,这将解释为什么它获得原始302的原因。

Per the Requests documentation: 根据请求文档:

Redirection and History 重定向和历史

By default Requests will perform location redirection for all verbs except HEAD . 默认情况下,Requests将对除HEAD以外的所有动词执行位置重定向。

We can use the history property of the Response object to track redirection. 我们可以使用Response对象的history属性来跟踪重定向。

The Response.history list contains the Response objects that were created in order to complete the request. Response.history列表包含为完成请求而创建的Response对象。 The list is sorted from the oldest to the most recent response. 该列表按从最早到最新的响应排序。

... ...

If you're using GET, OPTIONS, POST, PUT, PATCH or DELETE, you can disable redirection handling with the allow_redirects parameter: 如果您使用的是GET,OPTIONS,POST,PUT,PATCH或DELETE,则可以使用allow_redirects参数禁用重定向处理:

 >>> r = requests.get('https://github.com/', allow_redirects=False) >>> r.status_code 301 >>> r.history [] 

If you're using HEAD, you can enable redirection as well : 如果您使用的是HEAD,则还可以启用重定向

 >>> r = requests.head('https://github.com/', allow_redirects=True) >>> r.url 'https://github.com/' >>> r.history [<Response [301]>] 

So, in your code, change this: 因此,在您的代码中,更改以下内容:

r = requests.head(testpoint_url)

To this: 对此:

r = requests.head(testpoint_url, allow_redirects=True)

Then r.status_code will be the final status code (ie, 404) after all redirects have been followed. 然后,在遵循所有重定向之后, r.status_code将是最终状态代码(即404)。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 期望 404 但得到 200 HTTP 状态码 - Expecting 404 but getting 200 HTTP status code 未找到(状态代码:404):未知消息 - NOT FOUND (status code: 404): Unknown Message 我返回`status = HTTP_404_NOT_FOUND`的Response,但是找到201 - I return Response with `status=HTTP_404_NOT_FOUND` but find the 201 自定义视图与 django 中的 email 参数更改 url 并显示 Z80791B3AE7002CB88C246836DFA3 状态代码, - Custom view with email parameter in django changing the url and showing http status code 301,302 Scrapy 404错误:未处理或不允许HTTP状态代码 - Scrapy 404 error: HTTP status code is not handled or not allowed 使用硒查找断开的链接。 HTTP 302,HTTP 404预期 - Find broken links with Selenium. HTTP 302, HTTP 404 expected Python: elasticsearch.exceptions.NotFoundError: NotFoundError(404, '{"code":404,"message":"HTTP 404 Not Found"}') - Python: elasticsearch.exceptions.NotFoundError: NotFoundError(404, '{"code":404,"message":"HTTP 404 Not Found"}') 为什么我在 Django 中的测试函数返回状态代码 404 而页面在 chrome 浏览器中正确显示? - Why my testing function in the Django returns status code 404 while the page is shown correctly in the chrome browser? http.server “代码 404,找不到消息文件” - http.server "code 404, message File not found" Scrapy 404 错误信息:忽略响应 &lt;404 http://www.mega.pk/laptop-hp&gt;:HTTP 状态代码未处理或不允许 - Scrapy 404 Error INFO: Ignoring response <404 http://www.mega.pk/laptop-hp>: HTTP status code is not handled or not allowed
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM