简体   繁体   English

Python子进程回显一个unicode文字

[英]Python subprocess echo a unicode literal

I'm aware that questions like this have been asked before. 我知道之前有人问过这样的问题。 But I'm not finding a solution. 但我找不到解决方案。

I want to use a unicode literal, defined in my python file, with the subprocess module. 我想在子程序模块中使用我的python文件中定义的unicode文字。 But I'm not getting the results that I need. 但我没有得到我需要的结果。 For example the following code 例如以下代码

# -*- coding: utf-8 -*-
import sys
import codecs
import subprocess
cmd = ['echo', u'你好']
new_cmd = []
for c in cmd:
    if isinstance(c,unicode):
        c = c.encode('utf-8')
    new_cmd.append(c)
subprocess.call(new_cmd)

prints out 打印出来

你好

If I change the code to 如果我将代码更改为

# -*- coding: utf-8 -*-
import sys
import codecs
import subprocess
cmd = ['echo', u'你好']
new_cmd = []
for c in cmd:
    if isinstance(c,unicode):
        c = c.encode(sys.getfilesystemencoding())
    new_cmd.append(c)
subprocess.call(new_cmd)

I get the following 我得到以下内容

??

At this stage I can only assume I'm, repeatedly, making a simple mistake. 在这个阶段,我只能假设我反复犯了一个简单的错误。 But I'm having a hard time figuring out what it is. 但我很难弄清楚它是什么。 How can I get echo to print out the following when invoked via python's subprocess 当通过python的子进程调用时,如何获得echo以打印出以下内容

你好

Edit: 编辑:

The version of Python is 2.7. Python的版本是2.7。 I'm running on Windows 8 but I'd like the solution to be platform independent. 我在Windows 8上运行,但我希望解决方案与平台无关。

Your first try was the best. 你的第一次尝试是最好的。

You actually converted the 2 unicode characters u'你好' (or u'\你\好' ) in UTF8 all that giving b'\\xe4\\xbd\\xa0\\xe5\\xa5\\xbd' . 你实际上在UTF8中转换了2个unicode字符u'你好' (或u'\你\好' )所有给出b'\\xe4\\xbd\\xa0\\xe5\\xa5\\xbd'

You can control it in IDLE that fully support unicode and where b'\\xe4\\xbd\\xa0\\xe5\\xa5\\xbd'.decode('utf-8') gives back 你好 . 您可以在完全支持unicode的IDLE中控制它,并且b'\\xe4\\xbd\\xa0\\xe5\\xa5\\xbd'.decode('utf-8')将它返回给你好 Another way to control it is to redirect script output to a file and open it with an UTF-8 compatible editor : there again you will see what you want. 控制它的另一种方法是将脚本输出重定向到一个文件,并使用兼容UTF-8的编辑器打开它:再次,你会看到你想要的。

But the problem is that Windows console does not support full unicode. 但问题是Windows控制台不支持完整的unicode。 It depends on : 这取决于 :

  • the code page installed - I do not know for Windows 8 but previous versions had poor support for unicode and could display only 256 characters 安装的代码页 - 我不知道对于Windows 8,但以前的版本对unicode的支持很差,只能显示256个字符
  • the font used in the console - not all fonts have glyphs for all characters. 控制台中使用的字体 - 并非所有字体都包含所有字符的字形。

If you know a code page that contains glyphs for your characters (I don't), you can try to insert it in a console with chcp and explicitely encode your unicode string to that. 如果您知道包含字符字形的代码页(我没有),您可以尝试将其插入带有chcp的控制台中,并明确地将您的unicode字符串编码为该字符串。 But on my french machine, I do not know how to do ... except by passing by a text file ! 但是在我的法国机器上,我不知道怎么办...除了通过文本文件传递!

As you spoke of ConEmu, I did it a try ... and it works fine with it, with python 3.4 ! 当你谈到ConEmu时,我试了一下......用它运行得很好,使用python 3.4!

chcp 65001
py -3
import subprocess
cmd = ['cmd', '/c', 'echo', u'\u4f60\u597d']
subprocess.call(cmd)

gives : 给出:

你好  
0

The problem is only in the cmd.exe windows ! 问题只出现在cmd.exe窗口中!

Conclusion: Pay attention to character encodings (there are three different character encodings here). 结论:注意字符编码(这里有三种不同的字符编码)。 Use Python 3 if you want portable Unicode support (pass arguments as Unicode, don't encode them) or make sure that the data can be represented using current character encodings from the environment (encode using sys.getfilesystemencoding() on Python 2 as you do in the 2nd code example). 如果您需要可移植的Unicode支持(将参数作为Unicode传递,不对它们进行编码)或确保使用来自环境的当前字符编码来表示数据(使用Python 2上的sys.getfilesystemencoding()进行编码sys.getfilesystemencoding() ,请使用Python 3做第二个代码示例)。


The first code example is incorrect. 第一个代码示例不正确。 The effect is the same as (run it in IDLE -- py -3 -midlelib ): 效果与(在IDLE- py -3 -midlelib )相同:

>>> print(u'你好'.encode('utf-8').decode('mbcs')) #XXX DON'T DO IT!
你好

where mbcs codec uses your Windows ANSI code page (typically: cp1252 character encoding -- it may be different eg, cp1251 on Russian Windows). 其中mbcs编解码器使用您的Windows ANSI代码页 (通常: cp1252字符编码 - 它可能不同,例如,俄语Windows上的cp1251 )。

Python 2 uses CreateProcess macros to start a subprocess that is equivalent to CreateProcessA function there. Python 2使用CreateProcess宏来启动一个与此处的CreateProcessA函数等效的子CreateProcessA CreateProcessA interprets input bytes as being encoded using your Windows ANSI encoding. CreateProcessA将输入字节解释为使用Windows ANSI编码进行编码。 It is unrelated to the Python source code encoding (utf-8 in your case). Python源代码编码无关 (在您的情况下为utf-8)。

It is expected that you get mojibake if you use a wrong encoding. 如果你使用错误的编码,预计会得到mojibake。


Your second code example should work if input characters can be represented using Windows code page such as cp1252 (to enable encoding from Unicode to bytes) and if echo uses Unicode API to print to Windows console such as WriteConsoleW() (see Python 3 package win-unicode-console -- it enables print(u'你好') whatever your chcp ("OEM") is as long as the font in console supports the characters) or the characters can be represented using OEM code page (used by cmd.exe ) such as cp437 (run chcp to find out yours). 如果输入字符可以使用Windows代码页(如cp1252表示输入字符(从Unicode到字节的编码)以及echo使用Unicode API打印到Windows控制台(如WriteConsoleW() ),则第二个代码示例应该有效(参见Python 3包win-unicode-console - 只要控制台中的字体支持字符,它就可以print(u'你好') 任何 chcp(“OEM”), 或者可以使用OEM代码页表示字符(由cmd.exe )比如cp437 (运行chcp来查找你的)。 ?? question marks indicate that 你好 can't be represented using your console encoding. 问号表明你好无法使用控制台编码来表示。

To support arbitrary Unicode arguments (including characters that can't be represented using either Windows ("ANSI") or MS-DOS (OEM) code pages), you need CreateProcessW function (that is used by Python 3). 要支持任意Unicode参数(包括无法使用Windows(“ANSI”)或MS-DOS(OEM)代码页表示的字符),您需要CreateProcessW函数(由Python 3使用)。 See Unicode filenames on Windows with Python & subprocess.Popen() . 使用Python和subprocess.Popen()查看Windows上的Unicode文件名

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM