简体   繁体   中英

UnicodeEncodeError when formatting string with % in Python

For the life of me, I cannot figure this out: I am just trying to extract messages and who said them from a .json file. While I cannot disclose those data here, this is the line that does it:

print '<%s> %s' % (x['sender_id'], x['content'][0]['text'])

"x" is the dict containing the things I need to know. The output on each line is to look like so:

<username> The quick brown fox jumps over the lazy dog.

as seen in many IRC logs. Anyway, both of the strings in the tuple there are Unicode. That is to say they are formally of the Python unicode type. I checked. However when I try to format them into that string, the result is always something like:

UnicodeEncodeError: 'ascii' codec can't encode character u'\U0001f52b' in position 26: ordinal not in range(128)

I have tried many things, such as writing this instead:

print u'<%s> %s' % (x['sender_id'], x['content'][0]['text'])

Or:

print '<%s> %s' % (x['sender_id'], x['content'][0]['text']).encode('utf-8')

and I have tried combining those two strategies, and other things besides, but nothing I have tried works. What am I doing wrong?

It's probably print writing to stdout with an ASCII encoding, which is causing the problem. Check the value of sys.stdout.encoding to be sure. Either make sure you only print ASCII strings or set the default stdout encoding to something more reasonable like UTF-8 with the PYTHONIOENCODING env variable. Example:

$ PYTHONIOENCODING=utf-8 python myprogram.py

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM