pprint: UnicodeEncodeError: 'ascii' codec can't encode character

Question

This is driving me crazy. I'm trying to pprint a dict with an é char, and it throws me out.

I'm using Python 3:

    from pprint import pprint
    knights = {'gallahad': 'the pure', 'robin': 'the bravé'}
    pprint (knights)

Error:

File "/data/prod_envs/pythons/python36/lib/python3.6/pprint.py", line 176, in _format
stream.write(rep)
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 43: ordinal not in range(128)

I read up on the Python ASCII doc, but there does not seem a quick way to solve this, other than taking the dict apart, and rewriting the offending value to an ASCII value via .encode , and then re-assembling the dict again

Is there any way I can get this to print without taking the dict apart?

Answer 1

This is unrelated to pprint : the module only formats the string into another string and then passes the formatted string to the underlying stream. So your error occurs when the é character (U+00E9) is written to stdout.

Now it really depends on the underlying OS and the configuration of the Python interpreter. In Linux or other Unix-like systems, you could try to declare a UTF-8 or Latin1 charset in your terminal session by setting the environment variable PYTHONIOENCODING before starting Python:

$ export PYTHONIOENCODING=Latin1
$ python

(or use PYTHONIOENCODING=utf8 depending on the actual encoding of your terminal or terminal window).

Answer 2

Standard input and output are file objects in Python. The Python 3 documentation says that, when these objects are created, if encoding is left unspecified then locale.getpreferredencoding(False) is called to fetch the locale's preferred encoding.

Your system should have been set up with one or more "locales" when GNU/Linux was installed (I'm guessing from your paths that you are using some version of GNU/Linux). On a "sensible" setup, the default locale should allow UTF-8. But if you only did a "minimal" installation (for example as part of setting up a container), or something like that, then it is possible that the system has set locale to "C" (the ultimate fallback locale), which does not support UTF-8.

Just because your terminal can accept UTF-8 (as demonstrated by using echo with a UTF-8 string), does not mean Python knows that UTF-8 is acceptable. If Python sees the locale set to "C" then it will assume only ASCII is allowed unless told otherwise.

You can check the current locale by typing locale at the shell prompt, and change it by setting the LC_ALL environment variable. But before changing it you must check with locale -a to see which locales are available on your system, otherwise your change may not be effective and you may get the "C" locale anyway. If your system has not been set up with the locale you want, you can add it if you have root access: most GNU/Linux distributions provide options to do this when you (re)configure a package called locales , so for example on Debian/Ubuntu-based distros, sudo dpkg-reconfigure locales should show you the options.

But sometimes you will be in the awkward position of having to write a Python script to run on a system that has not been set up with decent locales and there's nothing you can do about it because you don't have root and the sysadmin insists on giving you the absolute minimum. Then what do we do?

Well there are options within Python itself. You could run export PYTHONIOENCODING=utf-8 before running Python, to tell Python to use that encoding no matter what the locale says. Or you could give pprint a stream= parameter, set to a stream that you've opened yourself using open() with an encoding="utf-8" parameter (although this is no good if you want to use sys.stdout or os.popen instead of a file). Or you could upgrade to Python 3.7 and use sys.stdout.reconfigure(encoding='utf-8') (but this won't work in the Python 3.6 mentioned in the original question).

Or, you could import codecs and do w=codecs.getwriter("utf-8")(sys.stdout.buffer) and then pass stream=w to your pprint :

from pprint import pprint
import sys, codecs
w=codecs.getwriter("utf-8")(sys.stdout.buffer)
d = {"testing": "这是个考验"}
pprint (d, stream=w)

pprint: UnicodeEncodeError: 'ascii' codec can't encode character

Question

2 answers

solution1
7 2017-12-30 14:09:04

solution2
0 2020-02-04 12:26:53

pprint: UnicodeEncodeError: 'ascii' codec can't encode character

Question

2 answers

solution1 7 2017-12-30 14:09:04

solution2 0 2020-02-04 12:26:53

solution1
7 2017-12-30 14:09:04

solution2
0 2020-02-04 12:26:53