简体   繁体   English

如何使python 3打印()utf8

[英]How to make python 3 print() utf8

How can I make python 3 (3.1) print("Some text") to stdout in UTF-8, or how to output raw bytes?如何将 python 3 (3.1) print("Some text") ) print("Some text")到 UTF-8 中的标准输出,或者如何输出原始字节?

Test.py测试文件

TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this is UTF-8
TestText2 = b"Test2 - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd" # just bytes
print(sys.getdefaultencoding())
print(sys.stdout.encoding)
print(TestText)
print(TestText.encode("utf8"))
print(TestText.encode("cp1252","replace"))
print(TestText2)

Output (in CP1257 and I replaced chars to byte values [x00] ):输出(在 CP1257 中,我将字符替换为字节值[x00] ):

utf-8
cp1257
Test - [xE2][xC2][xE7][C7][xE8][xC8]..[xF0][xD0][xFB][xDB][xFE][xDE]  
b'Test - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd'
b'Test - ??????..\x9a\x8a??\x9e\x8e'
b'Test2 - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd'

print is just too smart... :D There's no point using encoded text with print (since it always show only representation of bytes not real bytes) and it's impossible to output bytes at all, because print anyway and always encodes it in sys.stdout.encoding . print太聪明了...... :D 将编码文本与print一起使用是没有意义的(因为它总是只显示字节而不是实际字节的表示)并且根本不可能输出字节,因为无论如何打印并且总是在sys.stdout.encoding对其进行编码sys.stdout.encodingsys.stdout.encoding

For example: print(chr(255)) throws an error:例如: print(chr(255))抛出错误:

 Traceback (most recent call last): File "Test.py", line 1, in <module> print(chr(255)); File "H:\\Python31\\lib\\encodings\\cp1257.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\\xff' in position 0: character maps to <undefined>

By the way print( TestText == TestText2.decode("utf8")) returns False , although print output is the same.顺便说一下print( TestText == TestText2.decode("utf8"))返回False ,尽管打印输出是相同的。


How does Python 3 determine sys.stdout.encoding and how can I change it? Python 3 如何确定sys.stdout.encoding以及如何更改它?

I made a printRAW() function which works fine (actually it encodes output to UTF-8, so really it's not raw...):我做了一个printRAW()函数,它工作正常(实际上它将输出编码为 UTF-8,所以它真的不是原始的......):

 def printRAW(*Text):
     RAWOut = open(1, 'w', encoding='utf8', closefd=False)
     print(*Text, file=RAWOut)
     RAWOut.flush()
     RAWOut.close()

 printRAW("Cool", TestText)

Output (now it print in UTF-8):输出(现在它以 UTF-8 打印):

 Cool Test - āĀēĒčČ..šŠūŪžŽ

printRAW(chr(252)) also nicely prints ü (in UTF-8, [xC3][xBC] ) and without errors :) printRAW(chr(252))也可以很好地打印ü (在 UTF-8 中, [xC3][xBC] )并且没有错误:)

Now I'm looking for maybe better solution if there's any...现在我正在寻找可能更好的解决方案,如果有的话......

Clarification:澄清:

TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this not UTF-8...it is a Unicode string in Python 3.X.
TestText2 = TestText.encode('utf8') # this is a UTF-8-encoded byte string.

To send UTF-8 to stdout regardless of the console's encoding, use the its buffer interface, which accepts bytes:无论控制台的编码如何,要将 UTF-8 发送到标准输出,请使用其缓冲区接口,该接口接受字节:

import sys
sys.stdout.buffer.write(TestText2)

This is the best I can dope out from the manual, and it's a bit of a dirty hack:这是我可以从手册中找出的最好的东西,它有点脏:

utf8stdout = open(1, 'w', encoding='utf-8', closefd=False) # fd 1 is stdout
print(whatever, file=utf8stdout)

It seems like file objects should have a method to change their encoding, but AFAICT there isn't one.似乎文件对象应该有一种方法来改变它们的编码,但 AFAICT 没有。

If you write to utf8stdout and then write to sys.stdout without calling utf8stdout.flush() first, or vice versa, bad things may happen.如果你写入 utf8stdout 然后写入 sys.stdout 而不先调用 utf8stdout.flush() ,反之亦然,可能会发生不好的事情。

As per this answer根据这个答案

You can manually reconfigure the encoding of stdout as of python 3.7python 3.7您可以手动重新配置 stdout 的编码

import sys
sys.stdout.reconfigure(encoding='utf-8')

I tried zwol's solution in Python 3.6, but it didn't work for me.我在 Python 3.6 中尝试了zwol 的解决方案,但它对我不起作用。 With some strings there was no output printed to the console.对于某些字符串,控制台没有打印输出。

But iljau's solution worked: Reopen stdout with a different encoding.但是iljau 的解决方案奏效了:使用不同的编码重新打开标准输出。

import sys
sys.stdout = open(1, 'w', encoding='utf-8', closefd=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM