UnicodeEncodeError: 'ascii' codec can't encode character '\\xe9' - -when using urlib.request python3

Question

我正在編寫一個腳本，該腳本轉到鏈接列表並解析信息。

它適用於大多數站點，但它在某些站點上令人窒息，“UnicodeEncodeError：'ascii' 編解碼器無法在位置 13 中編碼字符 '\\xe9'：序號不在范圍內（128）”

它在 client.py 上停止，它是 python3 上 urlib 的一部分

確切的鏈接是： http ://finance.yahoo.com/news/cafés-growth-faster-than-fast-food-peers-144512056.html

這里有很多類似的帖子，但似乎沒有一個答案對我有用。

我的代碼是：

from urllib import request

def __request(link,debug=0):      

try:
    html = request.urlopen(link, timeout=35).read() #made this long as I was getting lots of timeouts
    unicode_html = html.decode('utf-8','ignore')

# NOTE the except HTTPError must come first, otherwise except URLError will also catch an HTTPError.
except HTTPError as e:
    if debug:
        print('The server couldn\'t fulfill the request for ' + link)
        print('Error code: ', e.code)
    return ''
except URLError as e:
    if isinstance(e.reason, socket.timeout):
        print('timeout')
        return ''    
else:
    return unicode_html

這會調用請求函數

鏈接 = ' http://finance.yahoo.com/news /cafés-growth-faster-than-fast-food-peers-144512056.html' 頁面 = __request(link)

回溯是：

Traceback (most recent call last):
  File "<string>", line 250, in run_nodebug
  File "C:\reader\get_news.py", line 276, in <module>
    main()
  File "C:\reader\get_news.py", line 255, in main
    body = get_article_body(item['link'],debug=0)
  File "C:\reader\get_news.py", line 155, in get_article_body
    page = __request('na',url)
  File "C:\reader\get_news.py", line 50, in __request
    html = request.urlopen(link, timeout=35).read()
  File "C:\Python33\Lib\urllib\request.py", line 156, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python33\Lib\urllib\request.py", line 469, in open
    response = self._open(req, data)
  File "C:\Python33\Lib\urllib\request.py", line 487, in _open
    '_open', req)
  File "C:\Python33\Lib\urllib\request.py", line 447, in _call_chain
    result = func(*args)
  File "C:\Python33\Lib\urllib\request.py", line 1268, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "C:\Python33\Lib\urllib\request.py", line 1248, in do_open
    h.request(req.get_method(), req.selector, req.data, headers)
  File "C:\Python33\Lib\http\client.py", line 1061, in request
    self._send_request(method, url, body, headers)
  File "C:\Python33\Lib\http\client.py", line 1089, in _send_request
    self.putrequest(method, url, **skips)
  File "C:\Python33\Lib\http\client.py", line 953, in putrequest
    self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 13: ordinal not in range(128)

任何幫助表示贊賞這讓我發瘋，我想我已經嘗試了 x.decode 和類似的所有組合

（如果可能的話，我可以忽略有問題的字符。）

Answer 1

使用百分比編碼的 URL ：

link = 'http://finance.yahoo.com/news/caf%C3%A9s-growing-faster-than-fast-food-peers-144512056.html'

我通過將瀏覽器指向了上面的百分比編碼的 URL

http://finance.yahoo.com/news/cafés-growing-faster-than-fast-food-peers-144512056.html

轉到頁面，然后將瀏覽器提供的編碼 URL 復制並粘貼回文本編輯器。 但是，您可以使用以下方法以編程方式生成百分比編碼的 URL：

from urllib import parse

link = 'http://finance.yahoo.com/news/cafés-growing-faster-than-fast-food-peers-144512056.html'

scheme, netloc, path, query, fragment = parse.urlsplit(link)
path = parse.quote(path)
link = parse.urlunsplit((scheme, netloc, path, query, fragment))

這產生

http://finance.yahoo.com/news/caf%C3%A9s-growing-faster-than-fast-food-peers-144512056.html

Answer 2

您的 URL 包含無法表示為 ASCII 字符的字符。

您必須確保所有字符都經過正確的 URL 編碼； 例如使用urllib.parse.quote_plus ； 它將使用 UTF-8 URL 編碼轉義來表示任何非 ASCII 字符。

UnicodeEncodeError: 'ascii' codec can't encode character '\\xe9' - -when using urlib.request python3

問題描述

這會調用請求函數

2 個解決方案

解決方案1
4 已采納 2014-03-29 17:54:25

解決方案2
1 2014-03-29 17:54:58

UnicodeEncodeError: &#39;ascii&#39; codec can&#39;t encode character &#39;\\xe9&#39; - -when using urlib.request python3

問題描述

這會調用請求函數

2 個解決方案

解決方案1 4 已采納 2014-03-29 17:54:25

解決方案2 1 2014-03-29 17:54:58

UnicodeEncodeError: 'ascii' codec can't encode character '\\xe9' - -when using urlib.request python3

解決方案1
4 已采納 2014-03-29 17:54:25

解決方案2
1 2014-03-29 17:54:58