简体   繁体   English

为什么Python 2.x会使用字符串格式+ Unicode引发异常?

[英]Why does Python 2.x throw an exception with string formatting + unicode?

I have the following piece of code. 我有以下代码。 The last line throws an error. 最后一行引发错误。 Why is that? 这是为什么?

class Foo(object):

    def __unicode__(self):
        return u'\u6797\u89ba\u6c11\u8b1d\u51b0\u5fc3\u6545\u5c45'

    def __str__(self):
        return self.__unicode__().encode('utf-8')

print "this works %s" % (u'asdf')
print "this works %s" % (Foo(),)
print "this works %s %s" % (Foo(), 'asdf')
print

print "this also works {0} {1}".format(Foo(), u'asdf')
print
print "this should break %s %s" % (Foo(), u'asdf')

The error is "UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 18: ordinal not in range(128)" 错误为“ UnicodeDecodeError:'ascii'编解码器无法解码位置18的字节0xe6:序数不在范围(128)中”

Python 2 implicitly will try and encode unicode values to strings when you mix unicode and string objects, or it will try and decode byte strings to unicode. 当您混合unicode和字符串对象时,Python 2隐式地尝试将unicode值编码为字符串, 或者尝试将字节字符串解码为unicode。

You are mixing unicode, byte strings and a custom object, and you are triggering a sequence of encodings and decodings that doesn't mix. 您正在混合unicode,字节字符串和一个自定义对象,并且正在触发一系列不混合的编码和解码。

In this case, your Foo() value is interpolated as a string ( str(Foo()) is used), and the u'asdf' interpolation triggers a decode of the template so far (so with the UTF-8 Foo() value) to interpolate the unicode string. 在这种情况下,您的Foo()值将作为字符串插值(使用str(Foo()) ),而u'asdf'插值将触发模板的解码 (到目前为止,使用UTF-8 Foo()值)以内插unicode字符串。 This decode fails as the ASCII codec cannot decode the \\xe6\\x9e\\x97 UTF-8 byte sequence already interpolated. 此解码失败,因为ASCII编解码器无法解码已经插值的\\xe6\\x9e\\x97 UTF-8字节序列。

You should always explicitly encode Unicode values to bytestrings or decode byte strings to Unicode before mixing types, as the corner cases are complex. 混合类型之前,您应始终将Unicode值显式编码为字节字符串或将字节字符串解码为Unicode,因为特殊情况非常复杂。

Explicitly converting to unicode() works: 显式转换为unicode()可以:

>>> print "this should break %s %s" % (unicode(Foo()), u'asdf')
this should break 林覺民謝冰心故居 asdf

as the output is turned into a unicode string: 将输出转换为unicode字符串:

>>> "this should break %s %s" % (unicode(Foo()), u'asdf')
u'this should break \u6797\u89ba\u6c11\u8b1d\u51b0\u5fc3\u6545\u5c45 asdf'

while otherwise you'd end up with a byte string: 否则,您将得到一个字节字符串:

>>> "this should break %s %s" % (Foo(), 'asdf')
'this should break \xe6\x9e\x97\xe8\xa6\xba\xe6\xb0\x91\xe8\xac\x9d\xe5\x86\xb0\xe5\xbf\x83\xe6\x95\x85\xe5\xb1\x85 asdf'

(note that asdf is left a bytestring too). (请注意, asdf也留有字节串)。

Alternatively, use a unicode template : 或者,使用unicode 模板

>>> u"this should break %s %s" % (Foo(), u'asdf')
u'this should break \u6797\u89ba\u6c11\u8b1d\u51b0\u5fc3\u6545\u5c45 asdf'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么在python中给str func一个unicode字符串会抛出异常? - why in python giving to str func a unicode string will throw an exception? 如何使Python 2.x Unicode字符串不打印为u'string'? - How to make Python 2.x Unicode strings not print as u'string'? Python 2.x:如何自动执行unicode而不是字符串? - Python 2.x: how to automate enforcing unicode instead of string? python 2.x中unicode字符串的string.ascii_letters相当于? - An equivalent to string.ascii_letters for unicode strings in python 2.x? Python 2.X:为什么我不能正确处理Unicode? - Python 2.X: Why Can't I Properly Handle Unicode? 为什么在python -c中输入unicode字符会引发异常 - Why inseting unicode character in python -c throw exception 在 python 2.X 中混合 unicode 和 str … 问题? - Mixing unicode and str in python 2.X … problems? Python 2.x 是否总是为打印语句返回一个字符串? - Does Python 2.x always return a string for print statements? 在python中,为什么调用字符串“ X”以ASCII显示它,而调用“ print X”以unicode显示它呢? - In python, why does calling a string, “X”, display it in ASCII, but calling “print X” display it in unicode? 用于utf8编码的字节串的unicode()与str.decode()(python 2.x) - unicode() vs. str.decode() for a utf8 encoded byte string (python 2.x)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM