UnicodeEncodeError when formatting string with % in Python

Question

For the life of me, I cannot figure this out: I am just trying to extract messages and who said them from a .json file. While I cannot disclose those data here, this is the line that does it:

print '<%s> %s' % (x['sender_id'], x['content'][0]['text'])

"x" is the dict containing the things I need to know. The output on each line is to look like so:

<username> The quick brown fox jumps over the lazy dog.

as seen in many IRC logs. Anyway, both of the strings in the tuple there are Unicode. That is to say they are formally of the Python unicode type. I checked. However when I try to format them into that string, the result is always something like:

UnicodeEncodeError: 'ascii' codec can't encode character u'\U0001f52b' in position 26: ordinal not in range(128)

I have tried many things, such as writing this instead:

print u'<%s> %s' % (x['sender_id'], x['content'][0]['text'])

Or:

print '<%s> %s' % (x['sender_id'], x['content'][0]['text']).encode('utf-8')

and I have tried combining those two strategies, and other things besides, but nothing I have tried works. What am I doing wrong?

Answer 1

It's probably print writing to stdout with an ASCII encoding, which is causing the problem. Check the value of sys.stdout.encoding to be sure. Either make sure you only print ASCII strings or set the default stdout encoding to something more reasonable like UTF-8 with the PYTHONIOENCODING env variable. Example:

$ PYTHONIOENCODING=utf-8 python myprogram.py

UnicodeEncodeError when formatting string with % in Python

Question

1 answers

solution1
1 ACCPTED 2013-08-18 22:47:46

UnicodeEncodeError when formatting string with % in Python

Question

1 answers

solution1 1 ACCPTED 2013-08-18 22:47:46

solution1
1 ACCPTED 2013-08-18 22:47:46