简体   繁体   中英

Printing to stdout with encoding in Python 3

I have a Python 3 program that reads some strings from a Windows-1252 encoded file:

with open(file, 'r', encoding="cp1252") as file_with_strings:
    # save some strings

Which I later want to write to stdout. I've tried to do:

print(some_string)
# => UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 180: ordinal not in range(128)

print(some_string.decode("utf-8"))
# => AttributeError: 'str' object has no attribute 'decode'

sys.stdout.buffer.write(some_str)
# => TypeError: 'str' does not support the buffer interface

print(some_string.encode("cp1252").decode("utf-8"))
# => UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 180: invalid continuation byte

print(some_string.encode("cp1252"))
# => has the unfortunate result of printing b'<my string>' instead of just the string

I'm scratching my head here. I'd like to print the string I got from the file just as it appears there, in cp1252. (In my terminal, when I do more $file , these characters appear as question marks, so my terminal is probably ascii.)

Would love some clarification! Thanks!

To anybody out there with the same problem, I ended up doing:

to_print = (some_string + "\n").encode("cp1252")
sys.stdout.buffer.write(to_print)
sys.stdout.flush() # I write a ton of these strings, and segfaulted without flushing

When you encode with cp1252, you have to decode with the same.

Eg:

import sys
txt = ("hi hello\n").encode("cp1252")
#print((txt).decode("cp1252"))
sys.stdout.buffer.write(txt)
sys.stdout.flush()

This will print "hi hello\\n" (which was encoded in cp1252) after decoding it.

You're either piping to your script or your locale is broken. You should fix your environment, rather than fixing your script to your environment, as this will make your script very brittle.

If you're piping, Python assumes the output should be "ASCII" and sets the encoding of stdout to "ASCII".

Under normal conditions, Python uses the locale to work out what encoding to apply to stdout. If your locale is broken (Not installed or corrupt), Python will default to "ASCII". A locale of "C", will also give you an encoding of "ASCII".

Check your locale by typing locale and ensure no errors are returned. Eg

$ locale
LANG="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_CTYPE="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_ALL=

If all else fails or you're piping, you can override Python's locale detection by setting the PYTHONIOENCODING environment variable. Eg

$ PYTHONIOENCODING=utf-8 ./my_python.sh

Remember that your shell has a locale and your terminal has an encoding - they both need to be set correctly

Since Python 3.7, you can change the encoding of all text written to sys.stdout with the reconfigure method:

import sys

sys.stdout.reconfigure(encoding="cp1252")

That could be helpful if you need to change the encoding for all output from your program.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM