Python打印无法同时打印Unicode和字符串

Question

The below are few cases I observed. 以下是我观察到的几种情况。 Like to know why Python's print is behaving like this, and possible fixes. 想知道为什么Python的打印效果如此，以及可能的修复方法。

>>> print "%s" % u"abc" # works
>>> print "%s" % "\xd1\x81" # works
>>> print "%s %s" % (u"abc", "\xd1\x81") # Error

For the above (last), I'm getting: UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 0: ordinal not in range(128) 对于以上（最后），我得到： UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 0: ordinal not in range(128)

But, this works 但是，这有效

>>> print "%s %s" % ("abc", "\xd17\x81") # works

And when I do 当我这样做

>>> print "%s %s" % (u"abc", u"\u0441") # Error

Its raising UnicodeEncodeError: 'charmap' codec can't encode character u'\с' in position 4: character maps to <undefined> 其引发的UnicodeEncodeError: 'charmap' codec can't encode character u'\с' in position 4: character maps to <undefined>

Answer 1

When you mix Unicode strings and byte strings in Python 2, the byte strings are implicitly coerced to Unicode using the default ascii codec. 在Python 2中混合Unicode字符串和字节字符串时，使用默认的ascii编解码器将字节字符串隐式强制为Unicode。 You will get UnicodeDecodeError if this fails. 如果失败，您将收到UnicodeDecodeError 。

When you print Unicode strings, they are implicitly encoded in the current output encoding. 当您打印Unicode字符串时，它们将以当前输出编码隐式编码。 You will get UnicodeEncodeError if this fails. 如果失败，您将收到UnicodeEncodeError 。

So: 所以：

>>> print "%s" % u"abc"

is really: 是真的：

>>> print unicode("%s",'ascii') % u"abc" # and valid

But the following only works if you mean "doesn't throw an error". 但是以下内容仅在您表示“不会引发错误”时有效。 If you expect it to print U+0441 character it will do so only if the output encoding is UTF-8. 如果希望它打印U + 0441字符，则仅在输出编码为UTF-8时才这样做。 It prints garbage on my Windows system. 它在Windows系统上打印垃圾。

>>> print "%s" % "\xd1\x81"

The following gives error because of the implicit Unicode decoding: 由于隐式Unicode解码，以下给出错误：

print "%s %s" % (u"abc", "\xd1\x81")

which is really: 这实际上是：

print unicode("%s %s",'ascii') % (u"abc", unicode("\xd1\x81",'ascii'))

\\xd1 and 0x81 are outside the ASCII range of 0-7Fh. \\xd1和0x81不在ASCII范围0-7Fh中。

The last error implies that your output encoding is not UTF-8, because it couldn't encode \с to a character supported by the output encoding for printing. 最后一个错误意味着您的输出编码不是UTF-8，因为它无法将\с编码为输出编码支持的字符以进行打印。 UTF-8 can encode all Unicode characters. UTF-8可以编码所有Unicode字符。

Answer 2

This is correct. 这是对的。 When you output, you have to encode your unicode object to the desired character encoding, ie utf-8 or whatever. 输出时，必须将unicode对象编码为所需的字符编码，即utf-8或其他。 Think of unicode (including all u"" literals) as an abstraction that has to be encoded to something like utf-8 prior to serialisation. 将unicode （包括所有u“”文字）视为一种抽象，必须在序列化之前将其编码为utf-8类的东西。

You can encode a unicode object s to utf-8 with s.encode('utf-8') . 您可以使用s.encode('utf-8')将unicode对象s编码为utf-8 。 str objects in Python 2 are byte-encoded, therefore you do not get an error with things like "\\xd17\\81", they are already encoded. Python 2中的str对象是字节编码的，因此您不会因为“ \\ xd17 \\ 81”之类的错误而出错，因为它们已经被编码了。

I would recommend you to use Python 3 rather than Python 2 where this is a bit more intuitive. 我建议您使用Python 3而不是Python 2，因为这更加直观。

Python打印无法同时打印Unicode和字符串

问题描述

2 个解决方案

解决方案1
2 2015-09-24 18:25:03

解决方案2
0 2015-09-24 17:12:24

Python打印无法同时打印Unicode和字符串

问题描述

2 个解决方案

解决方案1 2 2015-09-24 18:25:03

解决方案2 0 2015-09-24 17:12:24

解决方案1
2 2015-09-24 18:25:03

解决方案2
0 2015-09-24 17:12:24