使用Python请求库获取标头

Question

I am using Python requests library to get the header of html pages and use this to get the encoding. 我正在使用Python请求库来获取html页面的标题并使用它来获取编码。 But some of the links the requests fails to get header. 但是请求无法获得标题的一些链接。 For such cases I would like to use the encoding "utf-8". 对于这种情况，我想使用编码“utf-8”。 How do I handle such cases? 我该如何处理这类案件？ How do I handle error returned by requests.head. 如何处理requests.head返回的错误。

Here is my code: 这是我的代码：

r = requests.head(link) #how to handle error in case this fails?
charset = r.encoding
if (not charset):
    charset = "utf-8"

Error I am getting when requests fails to get the header : 当请求无法获取标头时我收到错误：

 File "parsexml.py", line 78, in parsefile
  r = requests.head(link)
 File "/usr/lib/python2.7/dist-packages/requests/api.py", line 74, in head
   return request('head', url, **kwargs)
 File "/usr/lib/python2.7/dist-packages/requests/api.py", line 40, in request
   return s.request(method=method, url=url, **kwargs)
 File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 229, in request
   r.send(prefetch=prefetch)
 File "/usr/lib/python2.7/dist-packages/requests/models.py", line 605, in send
   raise ConnectionError(e)
 requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.standardzilla.com', port=80): Max retries exceeded with url: /2008/08/01/diaries-of-a-freelancer-day-thirty-seven/

Answer 1

You should put your code in a try-except block, catching ConnectionErrors. 您应该将代码放在try-except块中，捕获ConnectionErrors。 Like this: 像这样：

try:
    r = requests.head(link) //how to handle error in case this fails?
    charset = r.encoding
    if (not charset):
      charset = "utf-8"
except requests.exceptions.ConnectionError:
    print 'Unable to access ' + link

使用Python请求库获取标头

问题描述

1 个解决方案

解决方案1
2 已采纳 2014-02-18 10:59:46

使用Python请求库获取标头

问题描述

1 个解决方案

解决方案1 2 已采纳 2014-02-18 10:59:46

解决方案1
2 已采纳 2014-02-18 10:59:46