简体   繁体   English

Python打印无法同时打印Unicode和字符串

[英]Python print failing to print Unicode and string same time

The below are few cases I observed. 以下是我观察到的几种情况。 Like to know why Python's print is behaving like this, and possible fixes. 想知道为什么Python的打印效果如此,以及可能的修复方法。

>>> print "%s" % u"abc" # works
>>> print "%s" % "\xd1\x81" # works
>>> print "%s %s" % (u"abc", "\xd1\x81") # Error

For the above (last), I'm getting: UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 0: ordinal not in range(128) 对于以上(最后),我得到: UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 0: ordinal not in range(128)

But, this works 但是,这有效

>>> print "%s %s" % ("abc", "\xd17\x81") # works

And when I do 当我这样做

>>> print "%s %s" % (u"abc", u"\u0441") # Error

Its raising UnicodeEncodeError: 'charmap' codec can't encode character u'\с' in position 4: character maps to <undefined> 其引发的UnicodeEncodeError: 'charmap' codec can't encode character u'\с' in position 4: character maps to <undefined>

When you mix Unicode strings and byte strings in Python 2, the byte strings are implicitly coerced to Unicode using the default ascii codec. 在Python 2中混合Unicode字符串和字节字符串时,使用默认的ascii编解码器将字节字符串隐式强制为Unicode。 You will get UnicodeDecodeError if this fails. 如果失败,您将收到UnicodeDecodeError

When you print Unicode strings, they are implicitly encoded in the current output encoding. 当您打印Unicode字符串时,它们将以当前输出编码隐式编码。 You will get UnicodeEncodeError if this fails. 如果失败,您将收到UnicodeEncodeError

So: 所以:

>>> print "%s" % u"abc"

is really: 是真的:

>>> print unicode("%s",'ascii') % u"abc" # and valid

But the following only works if you mean "doesn't throw an error". 但是以下内容仅在您表示“不会引发错误”时有效。 If you expect it to print U+0441 character it will do so only if the output encoding is UTF-8. 如果希望它打印U + 0441字符,则仅在输出编码为UTF-8时才这样做。 It prints garbage on my Windows system. 它在Windows系统上打印垃圾。

>>> print "%s" % "\xd1\x81"

The following gives error because of the implicit Unicode decoding: 由于隐式Unicode解码,以下给出错误:

print "%s %s" % (u"abc", "\xd1\x81")

which is really: 这实际上是:

print unicode("%s %s",'ascii') % (u"abc", unicode("\xd1\x81",'ascii'))

\\xd1 and 0x81 are outside the ASCII range of 0-7Fh. \\xd10x81不在ASCII范围0-7Fh中。

The last error implies that your output encoding is not UTF-8, because it couldn't encode to a character supported by the output encoding for printing. 最后一个错误意味着您的输出编码不是UTF-8,因为它无法将编码为输出编码支持的字符以进行打印。 UTF-8 can encode all Unicode characters. UTF-8可以编码所有Unicode字符。

This is correct. 这是对的。 When you output, you have to encode your unicode object to the desired character encoding, ie utf-8 or whatever. 输出时,必须将unicode对象编码为所需的字符编码,即utf-8或其他。 Think of unicode (including all u"" literals) as an abstraction that has to be encoded to something like utf-8 prior to serialisation. unicode (包括所有u“”文字)视为一种抽象,必须在序列化之前将其编码为utf-8类的东西。

You can encode a unicode object s to utf-8 with s.encode('utf-8') . 您可以使用s.encode('utf-8')unicode对象s编码为utf-8 str objects in Python 2 are byte-encoded, therefore you do not get an error with things like "\\xd17\\81", they are already encoded. Python 2中的str对象是字节编码的,因此您不会因为“ \\ xd17 \\ 81”之类的错误而出错,因为它们已经被编码了。

I would recommend you to use Python 3 rather than Python 2 where this is a bit more intuitive. 我建议您使用Python 3而不是Python 2,因为这更加直观。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM