简体   繁体   中英

Can I set Python 3.5 subprocess.Popen pipe encoding?

I have an edge case problem. My Python script_A.py has this code (abbreviated).

script_A.py:
from __future__ import unicode_literals
import subprocess

executable = 'sample.exe'

kwargs['bufsize'] = 0
kwargs['executable'] = executable
kwargs['stdin'] = subprocess.PIPE
kwargs['stdout'] = subprocess.PIPE
kwargs['stderr'] = subprocess.PIPE
kwargs['preexec_fn'] = None
kwargs['close_fds'] = False
kwargs['shell'] = False
kwargs['cwd'] = None
kwargs['env'] = None
kwargs['universal_newlines'] = True
kwargs['startupinfo'] = None
kwargs['creationflags'] = 0
if sys.version_info.major == 3 and sys.version_info.minor > 5:
    kwargs['encoding'] = 'utf-8'

args = ['', '-x']

subproc = subprocess.Popen(args, **kwargs)

# service subproc.stdout and subproc.stderr on threads
stdout = _start_thread(_get_stdout, subproc)
stderr = _start_thread(_get_stderr, subproc)

with codecs.open('myutf-8.txt', encoding='utf-8') as fh:
    for line in fh:
        if os.name == 'nt':
            subproc.stdin.write(b'%s\n' % line.rstrip().encode('utf-8'))
        else:
            subproc.stdin.write('%s\n' % line.rstrip()) # OFFENDING LINE BELOW

stdout.join()

This code works on Python 2.7.14 and 3.6.4 on Windows 8/10 and Ubuntu 16.04/17.10 all the time. Note some of the kwargs values are different on Windows, but they are irrelevant here. It works on Python 3.5.2 on 16.04, but only when I execute script_A.py from Gnome terminal.

Sometimes, I need to use script_B.py to launch script_A.py instead of a terminal. Script_B.py has identical subprocess.Popen() code to launch the appropriate Python executable.

script_B.py
if os.name == 'nt':
    if use_py2:
        executable = 'C:\\Python27\\python.exe'
    else:
        executable = 'C:\\Program Files\\Python36\\python.exe'
else:
    if use_py2:
        executable = '/usr/bin/python'
    else:
        executable = '/usr/bin/python3'

args = ['', 'script_A.py']

# ---- ditto above code from here ----

I get this error when I execute script_A.py from script_B.py with Popen() on Python 3.5.2. None of the other combinations of OS/Python versions fail.

Traceback:
  File "script_A.py", line 30, in run
    subproc.stdin.write('%s\n' % line.rstrip())
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)

You can see that on 2.7.14 and 3.6.4, I use specific code to force the pipes to utf-8. I don't know how to set utf-8 encoding on 3.5.2.

So, is there a way to configure encoding on 3.5.2 Popen's pipes? It might be easier to exclude Python 3.5 from support, but I thought I'd ask here.

Your input file is UTF-8, and the program you are feeding data to expects UTF-8 input. So just send the raw binary directly, instead of decoding from bytes to text, then reencoding from text to bytes.

Get rid of the line that turns on universal_newlines mode, and the line that sets kwargs['encoding'] , and replace your whole with block that feeds stdin with:

blinesep = os.linesep.encode('utf-8')  # Since you seem to need OS specific line endings
with open('myutf-8.txt', 'rb') as fh:
    for line in fh:
        subproc.stdin.writelines((sline, blinesep))

You can still handle the stdout / stderr streams as text streams if you like, you just explicitly wrap them with io.TextIOWrapper and the appropriate encoding. For example, you can wrap the binary stdout with:

textout = io.TextIOWrapper(subproc.stdout, encoding='utf-8')

A couple side-notes:

  1. You're correct to explicitly set bufsize when calling Popen since it's impossible to behave consistently across Python versions without doing so; the default buffering behavior is unbuffered ( bufsize=0 ) on Python 2 and Python 3.3.0 and earlier, and -1 (meaning "use decent default buffer size") in 3.3.1 and later. For performance, explicitly using bufsize=-1 is a good idea; you're threading the reads anyway, so buffering deadlocks aren't a concern.
  2. Never use codecs.open . It's buggy (doesn't translate line-endings, mixing readline with read(n) calls does weird things, when no encoding passed, it doesn't even wrap result of plain open , so the API changes, etc.), slow, and quasi-deprecated. If you need consistent behavior on Python 2.6 and higher, use io.open , which provides the Python 3 built-in open function consistently on Python 2.6 and higher.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM