Python unicode错误。 UnicodeEncodeError：'ascii'编解码器无法编码字符u'\\ u4e3a'

Question

So, I have this code to fetch JSON string from url 所以，我有这段代码可以从url获取JSON字符串

url = 'http://....'
response = urllib2.urlopen(rul)
string = response.read()
data = json.loads(string)

for x in data: 
    print x['foo']

The problem is x['foo'] , if tried to print it as seen above, I get this error. 问题是x['foo'] ，如果尝试如上所述打印它，则会出现此错误。

Warning: Incorrect string value: '\\xE4\\xB8\\xBA Co...' for column 'description' at row 1 警告：第1行的“说明”列的字符串值不正确：“ \\ xE4 \\ xB8 \\ xBA Co ...”

If I use x['foo'].decode("utf-8") I get this error: 如果我使用x['foo'].decode("utf-8")收到此错误：

UnicodeEncodeError: 'ascii' codec can't encode character u'\为' in position 0: ordinal not in range(128) UnicodeEncodeError：'ascii'编解码器无法在位置0编码字符u'\\ u4e3a'：序数不在范围内（128）

If I try, encode('ascii', 'ignore').decode('ascii') Then I get this error. 如果我尝试encode('ascii', 'ignore').decode('ascii')然后出现此错误。

x['foo'].encode('ascii', 'ignore').decode('ascii') AttributeError: 'NoneType' object has no attribute 'encode' x ['foo']。encode（'ascii'，'ignore'）。decode（'ascii'）AttributeError：'NoneType'对象没有属性'encode'

Is there any way to fix this problem? 有什么办法可以解决这个问题？

Answer 1

x['foo'].decode("utf-8") resulting in UnicodeEncodeError means that x['foo'] is of type unicode . x['foo'].decode("utf-8")导致UnicodeEncodeError表示x['foo']类型为unicode 。 str.decode takes a str type and translates it to unicode type. str.decode采用str类型并将其转换为unicode类型。 Python 2 is trying to be helpful here and attempts to implicitly convert your unicode to str so that you can call decode on it. Python 2试图在这里提供帮助，并尝试将unicode隐式转换为str以便您可以decode进行decode 。 It does this with sys.defaultencoding , which is ascii , which can't encode all of Unicode, hence the exception. 它使用sys.defaultencoding完成此sys.defaultencoding ，后者是ascii ，它无法对所有Unicode进行编码，因此是例外。

The solution here is to remove the decode call - the value is already unicode . 解决方案是删除decode调用-该值已经是unicode 。

Read Ned Batchelder's presentation - Pragmatic Unicode - it will greatly enhance your understanding of this and help prevent similar errors in the future. 阅读Ned Batchelder的演示文稿- 实用 Unicode-它将大大增进您对此的理解，并有助于防止将来发生类似的错误。

It's worth noting here that everything returned by json.load will be unicode and not str . 在这里值得注意的是， json.load返回的json.load内容都是unicode而不是str 。

Addressing the new question after edits: 编辑后解决新问题：

When you print , you need bytes - unicode is an abstract concept. print ，需要字节-unicode是一个抽象概念。 You need a mapping from the abstract unicode string into bytes - in python terms, you must convert your unicode object to str . 您需要从抽象unicode字符串到字节的映射-用python术语，必须将unicode对象转换为str 。 You can do this be calling encode with an encoding that tells it how to translate from the abstract string into concrete bytes. 您可以通过使用告诉其如何从抽象字符串转换为具体字节的编码来调用encode 。 Generally you want to use the utf-8 encoding. 通常，您要使用utf-8编码。

This should work: 这应该工作：

print x['foo'].encode('utf-8')

Python unicode错误。 UnicodeEncodeError：'ascii'编解码器无法编码字符u'\\ u4e3a'

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-10-07 12:49:29

Python unicode错误。 UnicodeEncodeError：&#39;ascii&#39;编解码器无法编码字符u&#39;\\ u4e3a&#39;

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-10-07 12:49:29

Python unicode错误。 UnicodeEncodeError：'ascii'编解码器无法编码字符u'\\ u4e3a'

解决方案1
2 已采纳 2015-10-07 12:49:29