如何在urllib.urlretrieve中捕獲404錯誤

Question

背景：我正在使用urllib.urlretrieve ，而不是urllib*模塊中的任何其他函數，因為鈎子函數支持（參見下面的reporthook ）..它用於顯示文本進度條。 這是Python> = 2.6。

>>> urllib.urlretrieve(url[, filename[, reporthook[, data]]])

但是， urlretrieve是如此愚蠢，以至於它無法檢測HTTP請求的狀態（例如：是404還是200？）。

>>> fn, h = urllib.urlretrieve('http://google.com/foo/bar')
>>> h.items() 
[('date', 'Thu, 20 Aug 2009 20:07:40 GMT'),
 ('expires', '-1'),
 ('content-type', 'text/html; charset=ISO-8859-1'),
 ('server', 'gws'),
 ('cache-control', 'private, max-age=0')]
>>> h.status
''
>>>

下載具有類似鈎子支持的遠程HTTP文件（顯示進度條）和一個不錯的HTTP錯誤處理的最着名的方法是什么？

Answer 1

查看urllib.urlretrieve的完整代碼：

def urlretrieve(url, filename=None, reporthook=None, data=None):
  global _urlopener
  if not _urlopener:
    _urlopener = FancyURLopener()
  return _urlopener.retrieve(url, filename, reporthook, data)

換句話說，您可以使用urllib.FancyURLopener （它是公共urllib API的一部分）。 您可以覆蓋http_error_default以檢測http_error_default ：

class MyURLopener(urllib.FancyURLopener):
  def http_error_default(self, url, fp, errcode, errmsg, headers):
    # handle errors the way you'd like to

fn, h = MyURLopener().retrieve(url, reporthook=my_report_hook)

Answer 2

你應該使用：

import urllib2

try:
    resp = urllib2.urlopen("http://www.google.com/this-gives-a-404/")
except urllib2.URLError, e:
    if not hasattr(e, "code"):
        raise
    resp = e

print "Gave", resp.code, resp.msg
print "=" * 80
print resp.read(80)

編輯：這里的基本原理是，除非你期望異常狀態，它是一個例外，它可能發生，你可能甚至沒有想到它 - 所以，而不是讓你的代碼繼續運行，而不成功，默認行為 - 非常合理 - 禁止其執行。

Answer 3

URL Opener對象的“retreive”方法支持reporthook並在404上引發異常。

http://docs.python.org/library/urllib.html#url-opener-objects

如何在urllib.urlretrieve中捕獲404錯誤

問題描述

3 個解決方案

解決方案1
28 2009-08-20 21:11:37

解決方案2
14 2010-02-04 20:17:57

解決方案3
2 2009-08-20 21:13:46

如何在urllib.urlretrieve中捕獲404錯誤

問題描述

3 個解決方案

解決方案1 28 2009-08-20 21:11:37

解決方案2 14 2010-02-04 20:17:57

解決方案3 2 2009-08-20 21:13:46

解決方案1
28 2009-08-20 21:11:37

解決方案2
14 2010-02-04 20:17:57

解決方案3
2 2009-08-20 21:13:46