[英]In Python how to encode/decode unicode characters such as ö
在 CentOS 6.4 上使用 Python 2.6.6
import json
import urllib2
url = 'http://www.google.com.hk/complete/search?output=toolbar&hl=en&q=how%20to%20pronounce%20e'
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
opener.addheaders = [('Accept-Charset', 'utf-8')]
response = opener.open(url)
page = response.read()
print page
结果:
...<suggestion data="how to pronounce eyjafjallaj
Python 终止时没有错误消息。
我认为它死了,因为下一个字符是ö
:
<toplevel>
<CompleteSuggestion>
<suggestion data="how to pronounce edinburgh"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce elle"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce edith"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce et al"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce eunice"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce english names"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce edamame"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce erudite"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce eyjafjallajökull"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce either"/>
</CompleteSuggestion>
</toplevel>
http://www.google.com.hk/complete/search?output=toolbar&hl=en&q=how%20to%20pronounce%20e
这似乎是一个 unicode 问题,我以多种方式尝试了 encode('utf-8') 和 decode('utf-8') ,但它仍然死亡。 有任何想法吗?
PS 似乎我需要使用 urllib2 而不是 urllib,因为 urllib 会忽略导致其他问题的 cookie。
response.read()
返回一个字节串。 Python 不应该在打印字节串时死掉,因为没有发生字符转换,字节按原样打印。
您可以尝试改为打印 Unicode:
text = page.decode(response.info().getparam('charset') or 'utf-8')
print text
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.