当 URL 不存在时 Python 请求模块中的错误处理

Question

I am trying to nail down the error handling for the requests module in python in order to be notified as and when a URL is unavailable, ie HTTPError, ConnectionError, Timeout etc...我正在尝试确定 python 中请求模块的错误处理，以便在 URL 不可用时收到通知，即 HTTPError、ConnectionError、Timeout 等...

The issue that I am having is that I seem to be getting status responses of 200 even on FAKE URLs我遇到的问题是，即使在假 URL 上，我似乎也收到了 200 的状态响应

I have trawled through SO & various other web sources, tried many differing ways of seemingly trying to achieve the same goal but have so far come up empty.我已经浏览了 SO 和其他各种网络资源，尝试了许多不同的方法来似乎试图实现相同的目标，但到目前为止都是空的。

I have boiled the code down to as basic as it gets to simplify things.我已经将代码简化为基本的代码，以简化事情。

import requests

urls = ['http://fake-website.com', 
        'http://another-fake-website.com',
        'http://yet-another-fake-website.com',
        'http://google.com']

for url in urls:
    r = requests.get(url,timeout=1)
    try:
        r.raise_for_status()
    except:
        pass
    if r.status_code != 200:
        print ("Website Error: ", url, r)
    else:
        print ("Website Good: ", url, r)

I expected the first 3 URLs in the list to classed as 'Website Error:' as they are URLs that I have just made up.我希望列表中的前 3 个 URL 被归类为'Website Error:'因为它们是我刚刚创建的 URL。 The final URL in the list is quite obviously real so should be the only one to be listed as 'Website Good:'列表中的最终 URL 显然是真实的，因此应该是唯一一个被列为'Website Good:' URL。

What is happening is the first URL produces a correct response to the code as it gives a response code of 503 but the next two URLs do not produce a status_code at all according to https://httpstatus.io/ but only display ERROR with Cannot find URI. another-fake-website.com another-fake-website.com:80发生的事情是第一个 URL 对代码产生了正确的响应，因为它给出了 503 的响应代码，但根据https://httpstatus.io/ ，接下来的两个 URL 根本不产生status_code ，但只显示ERROR与Cannot find URI. another-fake-website.com another-fake-website.com:80 Cannot find URI. another-fake-website.com another-fake-website.com:80

So I expected all but the last URL in the list to be shown as 'Website Error:'所以我希望列表中除了最后一个 URL 之外的所有 URL 都显示为'Website Error:'

OUTPUT输出

when running script in Raspberry Pi在 Raspberry Pi 中运行脚本时

Python 2.7.9 (default, Sep 26 2018, 05:58:52) 
[GCC 4.9.2] on linux2
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>> 
('Website Error: ', 'http://fake-website.com', <Response [503]>)
('Website Good: ', 'http://another-fake-website.com', <Response [200]>)
('Website Good: ', 'http://yet-another-fake-website.com', <Response [200]>)
('Website Good: ', 'http://google.com', <Response [200]>)
>>>

If I enter all 4 URLs in to https://httpstatus.io/ I get this result:如果我在https://httpstatus.io/输入所有 4 个 URL，我会得到以下结果：

It shows a 503, a 200 & two URLs that do not have a status code but rather just display Error它显示了一个 503、一个 200 和两个没有状态代码而只是显示错误的 URL

UPDATE更新

so I thought that I would check this in Windows using PowerShell & followed this example: https://stackoverflow.com/a/52762602/5251044所以我想我会使用 PowerShell 在 Windows 中检查这个并遵循这个例子： https : //stackoverflow.com/a/52762602/5251044

This is the output below这是下面的输出

c:\Testing>powershell -executionpolicy bypass -File .\AnyName.ps1
0 - http://fake-website.com
200 - http://another-fake-website.com
200 - http://yet-another-fake-website.com
200 - http://google.com

as you can see, I am no further forward.正如你所看到的，我不再向前了。

UPDATE 2更新 2

having had further discussions with Fozoro HERE & trying various options with no fix in sight I thought that I would try this code using urllib2 instead of requests在与Fozoro HERE进行了进一步讨论并尝试了各种选项但没有修复的情况下，我想我会尝试使用urllib2而不是requests代码

Here is the changed code这是更改后的代码

from urllib2 import urlopen
import socket

urls = ['http://another-fake-website.com',
        'http://fake-website.com',
        'http://yet-another-fake-website.com',
        'http://google.com',
        'dskjhkjdhskjh.com',
        'doioieowwros.com']

for url in urls:

    try:
        r  = urlopen(url, timeout = 5)
        r.getcode()
    except:
        pass
    if r.getcode() != 200:
        print ("Website Error: ", url, r.getcode())
    else:
        print ("Website Good: ", url, r.getcode())

Unfortunately the resulting output is still not correct but does differ slightly from the output of the previous code, see below:不幸的是，结果输出仍然不正确，但与之前代码的输出略有不同，见下文：

Python 2.7.9 (default, Sep 26 2018, 05:58:52) 
[GCC 4.9.2] on linux2
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>> 
('Website Good: ', 'http://another-fake-website.com', 200)
('Website Good: ', 'http://fake-website.com', 200)
('Website Good: ', 'http://yet-another-fake-website.com', 200)
('Website Good: ', 'http://google.com', 200)
('Website Good: ', 'dskjhkjdhskjh.com', 200)
('Website Good: ', 'doioieowwros.com', 200)
>>>

This time it is showing all 200 responses, very peculiar.这次它显示了所有200响应，非常奇特。

Answer 1

You should put r = requests.get(url,timeout=1) inside of the try: block.您应该将r = requests.get(url,timeout=1)放在try:块中。 So your code needs to look like this:所以你的代码需要看起来像这样：

import requests

urls = ['http://fake-website.com', 
        'http://another-fake-website.com',
        'http://yet-another-fake-website.com',
        'http://google.com']

for url in urls:
    try:
        r = requests.get(url,timeout=1)
        r.raise_for_status()
    except:
        pass
    if r.status_code != 200:
        print ("Website Error: ", url, r)
    else:
        print ("Website Good: ", url, r)

Output:输出：

Website Error:  http://fake-website.com <Response [503]>
Website Error:  http://another-fake-website.com <Response [503]>
Website Error:  http://yet-another-fake-website.com <Response [503]>
Website Good:  http://google.com <Response [200]>

I hope this helps!我希望这有帮助！

Answer 2

For me, the reason turned out to be a website served by my ISP about the URL being invalid - it's that website that returns a 200, not the fake one.对我来说，原因原来是我的 ISP 提供的关于 URL 无效的网站 - 是该网站返回 200，而不是假的。

This can be verified by printing the content of the returned site with requests.get('http://fakesite').text这可以通过使用requests.get('http://fakesite').text打印返回站点的内容来验证

当 URL 不存在时 Python 请求模块中的错误处理

问题描述

2 个解决方案

解决方案1
2 2019-04-07 15:34:29

解决方案2
1 2021-01-01 21:13:19

当 URL 不存在时 Python 请求模块中的错误处理

问题描述

2 个解决方案

解决方案1 2 2019-04-07 15:34:29

解决方案2 1 2021-01-01 21:13:19

解决方案1
2 2019-04-07 15:34:29

解决方案2
1 2021-01-01 21:13:19