简体   繁体   English

如何以适用于python2和python3的方式将utf8写入标准输出

[英]How to write utf8 to standard output in a way that works with python2 and python3

I want to write a non-ascii character, lets say to standard output. 我想写一个非ASCII字符,比方说标准输出。 The tricky part seems to be that some of the data that I want to concatenate to that string is read from json. 棘手的部分似乎是我要连接到该字符串的某些数据是从json读取的。 Consider the follwing simple json document: 考虑下面的简单json文档:

{"foo":"bar"}

I include this because if I just want to print then it seems enough to simply write: 我之所以这样写是因为,如果我只想打印那么似乎只需写一下即可:

print("→")

and it will do the right thing in python2 and python3. 它将在python2和python3中做正确的事情。

So I want to print the value of foo together with my non-ascii character . 所以我想将foo的值与我的非ascii字符一起打印 The only way I found to do this such that it works in both, python2 and python3 is: 我发现做到这一点使其在python2和python3中都可以使用的唯一方法是:

getattr(sys.stdout, 'buffer', sys.stdout).write(data["foo"].encode("utf8")+u"→".encode("utf8"))

or 要么

getattr(sys.stdout, 'buffer', sys.stdout).write((data["foo"]+u"→").encode("utf8"))

It is important to not miss the u in front of because otherwise a UnicodeDecodeError will be thrown by python2. 重要的是不要错过前面的u ,否则python2会抛出UnicodeDecodeError

Using the print function like this: 使用如下print功能:

print((data["foo"]+u"→").encode("utf8"), file=(getattr(sys.stdout, 'buffer', sys.stdout)))

doesnt seem to work because python3 will complain TypeError: 'str' does not support the buffer interface . 似乎不起作用,因为python3会抱怨TypeError: 'str' does not support the buffer interface

Did I find the best way or is there a better option? 我找到了最佳方法还是有更好的选择? Can I make the print function work? 我可以使打印功能正常工作吗?

The most concise I could come up with is the following, which you may be able to make more concise with a few convenience functions (or even replacing/overriding the print function): 我可以想出的最简洁的方法是以下几种,您可以通过一些便捷功能(甚至替换/替代打印功能)使它们更简洁:

# -*- coding=utf-8 -*-
import codecs
import os
import sys

# if you include the -*- coding line, you can use this
output = 'bar' + u'→'
# otherwise, use this
output = 'bar' + b'\xe2\x86\x92'.decode('utf-8')

if sys.stdout.encoding == 'UTF-8':
    print(output)
else:
    output += os.linesep
    if sys.version_info[0] >= 3:
        sys.stdout.buffer.write(bytes(output.encode('utf-8')))
    else:
        codecs.getwriter('utf-8')(sys.stdout).write(output)

The best option is using the -*- encoding line, which allows you to use the actual character in the file. 最好的选择是使用-*-编码行,它允许您使用文件中的实际字符。 But if for some reason, you can't use the encoding line, it's still possible to accomplish without it. 但是,如果由于某种原因您不能使用编码行,那么没有它仍然可以完成。

This (both with and without the encoding line) works on Linux (Arch) with python 2.7.7 and 3.4.1. 这(带和不带编码行)在具有python 2.7.7和3.4.1的Linux(Arch)上均可使用。 It also works if the terminal's encoding is not UTF-8. 如果终端的编码不是UTF-8,它也可以工作。 (On Arch Linux, I just change the encoding by using a different LANG environment variable.) (在Arch Linux上,我只是通过使用其他LANG环境变量来更改编码。)

LANG=zh_CN python test.py

It also sort of works on Windows, which I tried with 2.6, 2.7, 3.3, and 3.4. 它还排序在Windows上运行,这是我与2.6,2.7,3.3,和3.4试过 By sort of , I mean I could get the '→' character to display only on a mintty terminal. 排序的 ,我的意思是我能得到'→'字符只显示一个mintty终端。 On a cmd terminal, that character would display as 'ΓåÆ' . 在cmd终端上,该字符将显示为'ΓåÆ' (There may be something simple I'm missing there.) (这里可能缺少一些简单的东西。)

If you don't need to print to sys.stdout.buffer , then the following should print fine to sys.stdout . 如果您不需要打印到sys.stdout.buffer ,那么以下内容应该可以正常打印到sys.stdout I tried it in both Python 2.7 and 3.4, and it seemed to work fine: 我在python 2.7和3.4中都尝试过,它似乎运行良好:

# -*- coding=utf-8 -*-
print("bar" + u"→")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM