[英]Python unicode error. UnicodeEncodeError: 'ascii' codec can't encode character u'\u4e3a'
So, I have this code to fetch JSON string from url
所以,我有这段代码可以从url
获取JSON字符串
url = 'http://....'
response = urllib2.urlopen(rul)
string = response.read()
data = json.loads(string)
for x in data:
print x['foo']
The problem is x['foo']
, if tried to print it as seen above, I get this error. 问题是x['foo']
,如果尝试如上所述打印它,则会出现此错误。
Warning: Incorrect string value: '\\xE4\\xB8\\xBA Co...' for column 'description' at row 1 警告:第1行的“说明”列的字符串值不正确:“ \\ xE4 \\ xB8 \\ xBA Co ...”
If I use x['foo'].decode("utf-8")
I get this error: 如果我使用x['foo'].decode("utf-8")
收到此错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'\为' in position 0: ordinal not in range(128) UnicodeEncodeError:'ascii'编解码器无法在位置0编码字符u'\\ u4e3a':序数不在范围内(128)
If I try, encode('ascii', 'ignore').decode('ascii')
Then I get this error. 如果我尝试encode('ascii', 'ignore').decode('ascii')
然后出现此错误。
x['foo'].encode('ascii', 'ignore').decode('ascii') AttributeError: 'NoneType' object has no attribute 'encode' x ['foo']。encode('ascii','ignore')。decode('ascii')AttributeError:'NoneType'对象没有属性'encode'
Is there any way to fix this problem? 有什么办法可以解决这个问题?
x['foo'].decode("utf-8")
resulting in UnicodeEncodeError
means that x['foo']
is of type unicode
. x['foo'].decode("utf-8")
导致UnicodeEncodeError
表示x['foo']
类型为unicode
。 str.decode
takes a str
type and translates it to unicode
type. str.decode
采用str
类型并将其转换为unicode
类型。 Python 2 is trying to be helpful here and attempts to implicitly convert your unicode
to str
so that you can call decode
on it. Python 2试图在这里提供帮助,并尝试将unicode
隐式转换为str
以便您可以decode
进行decode
。 It does this with sys.defaultencoding
, which is ascii
, which can't encode all of Unicode, hence the exception. 它使用sys.defaultencoding
完成此sys.defaultencoding
,后者是ascii
,它无法对所有Unicode进行编码,因此是例外。
The solution here is to remove the decode
call - the value is already unicode
. 解决方案是删除decode
调用-该值已经是unicode
。
Read Ned Batchelder's presentation - Pragmatic Unicode - it will greatly enhance your understanding of this and help prevent similar errors in the future. 阅读Ned Batchelder的演示文稿- 实用 Unicode-它将大大增进您对此的理解,并有助于防止将来发生类似的错误。
It's worth noting here that everything returned by json.load
will be unicode
and not str
. 在这里值得注意的是, json.load
返回的json.load
内容都是unicode
而不是str
。
Addressing the new question after edits: 编辑后解决新问题:
When you print
, you need bytes - unicode is an abstract concept. print
,需要字节-unicode是一个抽象概念。 You need a mapping from the abstract unicode string into bytes - in python terms, you must convert your unicode
object to str
. 您需要从抽象unicode字符串到字节的映射-用python术语,必须将unicode
对象转换为str
。 You can do this be calling encode
with an encoding that tells it how to translate from the abstract string into concrete bytes. 您可以通过使用告诉其如何从抽象字符串转换为具体字节的编码来调用encode
。 Generally you want to use the utf-8 encoding. 通常,您要使用utf-8编码。
This should work: 这应该工作:
print x['foo'].encode('utf-8')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.