简体   繁体   中英

Python UnicodeDecodeError - How to correctly read unicode strings from subprocess?

I am having problems with subprocesses in Python which return unicode characters, especially the German ü, ä, ö characters.

My script basically wants to open a subprocess, which returns some strings with the stdout.read() function. Some of those strings may contain unicode characters, but it is not always known if and where those characters are. So the output has to be decoded (or encoded?) somehow to correctly display the string. A byte-object is not possible for me to work with.

The following code displays in short what I try to do, but fails to decode the string, hence the "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 12: invalid start byte" Error-Message:

import subprocess

command_array = ['echo', 'string_with_ü_ä_ö']
command = subprocess.Popen(command_array, stdout=subprocess.PIPE, shell=True)

command_output = command.stdout.read()
command_output = command_output.decode()
print(command_output)

I feel that there has to be some trivial solution to this, which I failed to find anywhere. Is there any way to correctly return those unicode characters in a string?

I am using Python 3.6.3, and the above script runs on Windows. A version which works under Linux as well will be equally appreciated!

With Python >= 3.6, you want subprocess.run() with universal_newlines=True

import subprocess

command_array = ['echo', 'string_with_ü_ä_ö']
result = subprocess.run(command_array,
    stdout=subprocess.PIPE, universal_newlines=True)
print(result.stdout)

In Python 3.7 the universal_newlines alias was replaced with text which better explains what the option actually does.

I have found by trial and error that decoding with cp850 works and yields the expected output:

import subprocess

command_array = ['echo', 'string_with_ü_ä_ö']
command = subprocess.Popen(command_array, stdout=subprocess.PIPE, shell=True)

command_output = command.stdout.read()
command_output = command_output.decode('cp850')
print(command_output)

If you save the above code as a utf8 encoded file (the default for python3 regardless the platform) and run it with python3 it prints:

string_with_ü_ä_ö

Unfortunately I don't know where or why this particular encoding is chosen so this might not work with different setups but at least I am confident it will with yours.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM