简体   繁体   English

子过程的参数编码

[英]Encoding of arguments to subprocess.Popen

I have a Python extension to the Nautilus file browser (AFAIK this runs exclusively on GNU/Linux/Unix/etc environments). 我有Nautilus文件浏览器的Python扩展程序(AFAIK仅在GNU / Linux / Unix / etc环境下运行)。 I decided to split out an expensive computation and run it as a subprocess, pickle the result and send it back over a pipe. 我决定分解出一个昂贵的计算并将其作为子进程运行,对结果进行腌制并将其通过管道发送回去。 My question concerns the arguments to the script. 我的问题涉及到脚本变量情况。 Since the computation requires a path argument and a boolean argument I figured I could do this in two ways: send the args in a pickled tuple over a pipe, or give them on the command line. 由于计算需要一个路径参数和一个布尔参数,因此我想我可以通过两种方式做到这一点:通过管道在腌制元组中发送args,或者在命令行中提供它们。 I found that the pickled tuple approach is noticeably slower than just giving arguments, so I went with the subprocess argument approach. 我发现,腌制元组方法比仅仅提供参数要慢得多,因此我选择了子流程参数方法。

However, I'm worried about localisation issues that might arise. 但是,我担心可能出现的本地化问题。 At present, in the caller I have: 目前,在呼叫者中我有:

subprocess.Popen(
    [sys.executable, path_to_script, path.encode("utf-8"), str(recurse)],
    stdin = None,
    stdout = subprocess.PIPE)

In the script: 在脚本中:

path = unicode(sys.argv[1], "utf-8")

My concern is that encoding the path argument as UTF-8 is a mistake, but I don't know for sure. 我担心的是,将path参数编码为UTF-8是一个错误,但是我不确定。 I want to avoid a "it works on my machine" syndrome. 我想避免“在我的机器上可以工作”综合症。 Will this fail if a user has, say, latin1 as their default character encoding? 如果用户使用latin1作为默认字符编码,这会失败吗? Or does it not matter? 还是没关系?

It does not matter: as long as your script knows to expect a utf-8 encoding for the argument, it can decode it properly. 没关系:只要您的脚本知道期望参数使用utf-8编码,它就可以正确解码。 utf-8 is the correct choice because it will let you encode ANY Unicode string -- not just those for some languages but not others, as choices such as Latin-1 would entail! utf-8是正确的选择,因为它可以让您对任何Unicode字符串进行编码-不仅是某些语言的Unicode字符串,其他语言也不能,因为诸如Latin-1这样的选择就需要编码!

Use sys.getfilesystemencoding() if file names should be readable by user. 如果文件名应可由用户读取,请使用sys.getfilesystemencoding() However this can cause problems when there are characters not supported by the system encoding. 但是,当系统编码不支持某些字符时,这可能会导致问题。 To avoid this you can substitute missing characters with some character sequences (eg by registering you own error handling function with codecs.register_error() ). 为避免这种情况,您可以用某些字符序列替换丢失的字符(例如,通过在codecs.register_error()注册自己的错误处理函数)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM