简体   繁体   中英

HTTP 404 status code (Not Found) shown as 302

I'm trying to retrieve the HTTP status code of a list of URLs in python, using the following piece of code:

try:
    r = requests.head(testpoint_url)
    print(testpoint_url+" : "+str(r.status_code))
    # prints the int of the status code.
except requests.ConnectionError:
    print("failed to connect")

Surprisingly, for some URLs, I get 302 status code while if browsed by a browser, you see it showing a 404 code!

在此处输入图片说明

What is going on? How can I get the real status code (eg 404)?

302 is an HTTP redirection. A web browser will follow the redirect to the URL reported in the Location response header. When requesting that next URL, it will have its own response code, which can include 404.

Your Python code does not follow the redirect, which would explain why it gets the original 302 instead.

Per the Requests documentation:

Redirection and History

By default Requests will perform location redirection for all verbs except HEAD .

We can use the history property of the Response object to track redirection.

The Response.history list contains the Response objects that were created in order to complete the request. The list is sorted from the oldest to the most recent response.

...

If you're using GET, OPTIONS, POST, PUT, PATCH or DELETE, you can disable redirection handling with the allow_redirects parameter:

 >>> r = requests.get('https://github.com/', allow_redirects=False) >>> r.status_code 301 >>> r.history [] 

If you're using HEAD, you can enable redirection as well :

 >>> r = requests.head('https://github.com/', allow_redirects=True) >>> r.url 'https://github.com/' >>> r.history [<Response [301]>] 

So, in your code, change this:

r = requests.head(testpoint_url)

To this:

r = requests.head(testpoint_url, allow_redirects=True)

Then r.status_code will be the final status code (ie, 404) after all redirects have been followed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM