简体   繁体   English

如何在 Python 3 中设置 sys.stdout 编码?

[英]How to set sys.stdout encoding in Python 3?

Setting the default output encoding in Python 2 is a well-known idiom:在 Python 2 中设置默认输出编码是一个众所周知的习惯用法:

sys.stdout = codecs.getwriter("utf-8")(sys.stdout)

This wraps the sys.stdout object in a codec writer that encodes output in UTF-8.这将sys.stdout对象包装在以 UTF-8 编码输出的编解码器sys.stdout器中。

However, this technique does not work in Python 3 because sys.stdout.write() expects a str , but the result of encoding is bytes , and an error occurs when codecs tries to write the encoded bytes to the original sys.stdout .然而,这种技术在 Python 3 中不起作用,因为sys.stdout.write()需要一个str ,但编码的结果是bytes ,当codecs尝试将编码的字节写入原始sys.stdout时会发生错误。

What is the correct way to do this in Python 3?在 Python 3 中执行此操作的正确方法是什么?

Python 3.1 added io.TextIOBase.detach() , with a note in the documentation for sys.stdout : Python 3.1 添加了io.TextIOBase.detach() ,在sys.stdout的文档中有一个注释:

The standard streams are in text mode by default.默认情况下,标准流处于文本模式。 To write or read binary data to these, use the underlying binary buffer.要向这些写入或读取二进制数据,请使用底层二进制缓冲区。 For example, to write bytes to stdout , use sys.stdout.buffer.write(b'abc') .例如,要将字节写入stdout ,请使用sys.stdout.buffer.write(b'abc') Using io.TextIOBase.detach() streams can be made binary by default.使用io.TextIOBase.detach()流可以默认为二进制。 This function sets stdin and stdout to binary:此函数将stdinstdout为二进制:

 def make_streams_binary(): sys.stdin = sys.stdin.detach() sys.stdout = sys.stdout.detach()

Therefore, the corresponding idiom for Python 3.1 and later is:因此,Python 3.1 及更高版本的对应成语是:

sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())

Since Python 3.7 you can change the encoding of standard streams with reconfigure() :从 Python 3.7 开始,您可以使用reconfigure()更改标准流的编码:

sys.stdout.reconfigure(encoding='utf-8')

You can also modify how encoding errors are handled by adding an errors parameter.您还可以通过添加errors参数来修改编码错误的处理方式。

I found this thread while searching for solutions to the same error,我在寻找相同错误的解决方案时发现了这个线程,

An alternative solution to those already suggested is to set the PYTHONIOENCODING environment variable before Python starts, for my use - this is less trouble then swapping sys.stdout after Python is initialized:已经建议的替代解决方案是Python 启动之前设置PYTHONIOENCODING环境变量,供我使用 - 这比在 Python 初始化后交换sys.stdout更麻烦:

PYTHONIOENCODING=utf-8:surrogateescape python3 somescript.py

With the advantage of not having to go and edit the Python code.优点是不必去编辑 Python 代码。

Other answers seem to recommend using codecs , but open works for me:其他答案似乎建议使用codecs ,但open对我有用:

import sys
sys.stdout = open(sys.stdout.fileno(), mode='w', encoding='utf8', buffering=1)
print("日本語")
# Also works with other methods of writing to stdout:
sys.stdout.write("日本語\n")
sys.stdout.buffer.write("日本語\n".encode())

This works even when I run it with PYTHONIOENCODING="ascii" .即使我使用PYTHONIOENCODING="ascii"运行它,这也有效。

Setting the default output encoding in Python 2 is a well-known idiom在 Python 2 中设置默认输出编码是一个众所周知的习惯用法

Eek!哎呀! Is that a well-known idiom in Python 2?这是 Python 2 中众所周知的习语吗? It looks like a dangerous mistake to me.对我来说,这似乎是一个危险的错误。

It'll certainly mess up any script that tries to write binary to stdout (which you'll need if you're a CGI script returning an image, for example).它肯定会弄乱任何试图将二进制文件写入标准输出的脚本(例如,如果您是一个返回图像的 CGI 脚本,您将需要它)。 Bytes and chars are quite different animals;字节和字符是完全不同的动物。 it's not a good idea to monkey-patch an interface that is specified to accept bytes with one that only takes chars.用只接受字符的接口来修补指定接受字节的接口并不是一个好主意。

CGI and HTTP in general explicitly work with bytes. CGI 和 HTTP 通常显式地使用字节。 You should only be sending bytes to sys.stdout.您应该只向 sys.stdout 发送字节。 In Python 3 that means using sys.stdout.buffer.write to send bytes directly.在 Python 3 中,这意味着使用sys.stdout.buffer.write直接发送字节。 Encoding page content to match its charset parameter should be handled at a higher level in your application (in cases where you are returning textual content, rather than binary).编码页面内容以匹配其charset参数应该在您的应用程序中在更高级别处理(在您返回文本内容而不是二进制内容的情况下)。 This also means print is no good for CGI any more.这也意味着print不再对 CGI 有利。

(To add to the confusion, wsgiref's CGIHandler has been broken in py3k until very recently, making it impossible to deploy WSGI to CGI that way. With PEP 3333 and Python 3.2 this is finally workable.) (更令人困惑的是,wsgiref 的 CGIHandler 直到最近才在 py3k 中被破坏,因此无法以这种方式将 WSGI 部署到 CGI。使用 PEP 3333 和 Python 3.2,这最终是可行的。)

Using detach() causes the interpreter to print a warning when it tries to close stdout just before it exits:使用detach()会导致解释器在它退出之前尝试关闭 stdout 时打印警告:

Exception ignored in: <_io.TextIOWrapper mode='w' encoding='UTF-8'>
ValueError: underlying buffer has been detached

Instead, this worked fine for me:相反,这对我来说很好用:

default_out = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')

(And, of course, writing to default_out instead of stdout.) (当然,写入default_out而不是 stdout。)

sys.stdout is in text mode in Python 3. Hence you write unicode to it directly, and the idiom for Python 2 is no longer needed. sys.stdout 在 Python 3 中处于文本模式。因此您直接向其写入 unicode,不再需要 Python 2 的习惯用法。

Where this would fail in Python 2:这在 Python 2 中会失败的地方:

>>> import sys
>>> sys.stdout.write(u"ûnicöde")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfb' in position 0: ordinal not in range(128)

However, it works just dandy in Python 3:但是,它在 Python 3 中非常有效:

>>> import sys
>>> sys.stdout.write("Ûnicöde")
Ûnicöde7

Now if your Python doesn't know what your stdouts encoding actually is, that's a different problem, most likely in the build of the Python.现在,如果你的 Python 不知道你的标准输出编码实际上是什么,那是一个不同的问题,很可能是在 Python 的构建中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM