简体   繁体   English

在Python中通过sys.stdout编写unicode字符串

[英]Writing unicode strings via sys.stdout in Python

Assume for a moment that one cannot use print (and thus enjoy the benefit of automatic encoding detection). 假设一个人不能使用print (从而享受自动编码检测的好处)。 So that leaves us with sys.stdout . 所以这给我们留下了sys.stdout However, sys.stdout is so dumb as to not do any sensible encoding . 但是, sys.stdout是如此愚蠢,以至于没有做任何合理的编码

Now one reads the Python wiki page PrintFails and goes to try out the following code: 现在,您可以阅读Python维基页面PrintFails并尝试以下代码:

$ python -c 'import sys, codecs, locale; print str(sys.stdout.encoding); \
  sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout);

However this too does not work (at least on Mac). 然而,这也不起作用(至少在Mac上)。 Too see why: 太明白为什么:

>>> import locale
>>> locale.getpreferredencoding()
'mac-roman'
>>> sys.stdout.encoding
'UTF-8'

(UTF-8 is what one's terminal understands). (UTF-8是终端理解的)。

So one changes the above code to: 所以将上面的代码更改为:

$ python -c 'import sys, codecs, locale; print str(sys.stdout.encoding); \
  sys.stdout = codecs.getwriter(sys.stdout.encoding)(sys.stdout);

And now unicode strings are properly sent to sys.stdout and hence printed properly on the terminal ( sys.stdout is attached the terminal). 现在unicode字符串被正确发送到sys.stdout ,因此在终端上正确打印( sys.stdout附加到终端)。

Is this the correct way to write unicode strings in sys.stdout or should I be doing something else? 这是在sys.stdout编写unicode字符串的正确方法,还是我应该做其他事情?

EDIT : at times--say, when piping the output to less -- sys.stdout.encoding will be None . 编辑 :有时 - 比如说,当输出到less - sys.stdout.encoding将是None in this case, the above code will fail. 在这种情况下,上面的代码将失败。

export PYTHONIOENCODING=utf-8

will do the job, but can't set it on python itself ... 将完成这项工作,但无法在python本身设置它...

what we can do is verify if isn't setting and tell the user to set it before call script with : 我们可以做的是验证是否设置并告诉用户在调用脚本之前设置它:

if __name__ == '__main__':
    if (sys.stdout.encoding is None):
        print >> sys.stderr, "please set python env PYTHONIOENCODING=UTF-8, example: export PYTHONIOENCODING=UTF-8, when write to stdout."
        exit(1)

Best idea is to check if you are directly connected to a terminal. 最好的办法是检查您是否直接连接到终端。 If you are, use the terminal's encoding. 如果是,请使用终端的编码。 Otherwise, use system preferred encoding. 否则,请使用系统首选编码。

if sys.stdout.isatty():
    default_encoding = sys.stdout.encoding
else:
    default_encoding = locale.getpreferredencoding()

It's also very important to always allow the user specify whichever encoding she wants. 始终允许用户指定她想要的编码也非常重要。 Usually I make it a command-line option (like -e ENCODING ), and parse it with the optparse module. 通常我将它作为命令行选项(如-e ENCODING ),并使用optparse模块解析它。

Another good thing is to not overwrite sys.stdout with an automatic encoder. 另一个好处是不要用自动编码器覆盖sys.stdout Create your encoder and use it, but leave sys.stdout alone. 创建编码器并使用它,但不要单独使用sys.stdout You could import 3rd party libraries that write encoded bytestrings directly to sys.stdout . 您可以导入将编码的字节串直接写入sys.stdout第三方库。

There is an optional environment variable "PYTHONIOENCODING" which may be set to a desired default encoding. 存在可选的环境变量“PYTHONIOENCODING”,其可以被设置为期望的默认编码。 It would be one way of grabbing the user-desired encoding in a way consistent with all of Python. 这将是以与所有Python一致的方式获取用户期望的编码的一种方式。 It is buried in the Python manual here . 它被埋葬在Python手册这里

This is what I am doing in my application: 这就是我在我的应用程序中所做的事情:

sys.stdout.write(s.encode('utf-8'))

This is the exact opposite fix for reading UTF-8 names from argv: 这是从argv读取UTF-8名称的完全相反的修复:

for file in sys.argv[1:]:
    file = file.decode('utf-8')

This is very ugly (IMHO) as it force you to work with UTF-8.. which is the norm on Linux/Mac, but not on windows... Works for me anyway :) 这是非常难看的(恕我直言),因为它迫使你使用UTF-8 ..这是Linux / Mac上的常态,但不是在Windows上......无论如何都适合我:)

It's not clear to my why you wouldn't be able to do print; 我不清楚为什么你不能打印; but assuming so, yes, the approach looks right to me. 但假设是这样,是的,这种方法对我来说是正确的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM