為什么Python 2.x會使用字符串格式+ Unicode引發異常？

Question

我有以下代碼。 最后一行引發錯誤。 這是為什么？

class Foo(object):

    def __unicode__(self):
        return u'\u6797\u89ba\u6c11\u8b1d\u51b0\u5fc3\u6545\u5c45'

    def __str__(self):
        return self.__unicode__().encode('utf-8')

print "this works %s" % (u'asdf')
print "this works %s" % (Foo(),)
print "this works %s %s" % (Foo(), 'asdf')
print

print "this also works {0} {1}".format(Foo(), u'asdf')
print
print "this should break %s %s" % (Foo(), u'asdf')

錯誤為“ UnicodeDecodeError：'ascii'編解碼器無法解碼位置18的字節0xe6：序數不在范圍（128）中”

Answer 1

當您混合unicode和字符串對象時，Python 2隱式地嘗試將unicode值編碼為字符串，或者嘗試將字節字符串解碼為unicode。

您正在混合unicode，字節字符串和一個自定義對象，並且正在觸發一系列不混合的編碼和解碼。

在這種情況下，您的Foo()值將作為字符串插值（使用str(Foo()) ），而u'asdf'插值將觸發模板的解碼（到目前為止，使用UTF-8 Foo()值）以內插unicode字符串。 此解碼失敗，因為ASCII編解碼器無法解碼已經插值的\\xe6\\x9e\\x97 UTF-8字節序列。

混合類型之前，您應始終將Unicode值顯式編碼為字節字符串或將字節字符串解碼為Unicode，因為特殊情況非常復雜。

顯式轉換為unicode()可以：

>>> print "this should break %s %s" % (unicode(Foo()), u'asdf')
this should break 林覺民謝冰心故居 asdf

將輸出轉換為unicode字符串：

>>> "this should break %s %s" % (unicode(Foo()), u'asdf')
u'this should break \u6797\u89ba\u6c11\u8b1d\u51b0\u5fc3\u6545\u5c45 asdf'

否則，您將得到一個字節字符串：

>>> "this should break %s %s" % (Foo(), 'asdf')
'this should break \xe6\x9e\x97\xe8\xa6\xba\xe6\xb0\x91\xe8\xac\x9d\xe5\x86\xb0\xe5\xbf\x83\xe6\x95\x85\xe5\xb1\x85 asdf'

（請注意， asdf也留有字節串）。

或者，使用unicode 模板：

>>> u"this should break %s %s" % (Foo(), u'asdf')
u'this should break \u6797\u89ba\u6c11\u8b1d\u51b0\u5fc3\u6545\u5c45 asdf'

為什么Python 2.x會使用字符串格式+ Unicode引發異常？

問題描述

1 個解決方案

解決方案1
3 已采納 2014-03-20 15:35:50

為什么Python 2.x會使用字符串格式+ Unicode引發異常？

問題描述

1 個解決方案

解決方案1 3 已采納 2014-03-20 15:35:50

解決方案1
3 已采納 2014-03-20 15:35:50