简体   繁体   中英

Can't print character '\u2019' in Python from JSON object

As a project to help me learn Python, I'm making a CMD viewer of Reddit using the json data (for example www.reddit.com/all/.json). When certain posts show up and I attempt to print them (that's what I assume is causing the error), I get this error:

Traceback (most recent call last): File "C:\\Users\\nsaba\\Desktop\\reddit_viewer.py", line 33, in print ( "%d. (%d) %s\\n" % (i+1, obj['data']['score'], obj['data']['title']))

File "C:\\Python33\\lib\\encodings\\cp437.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\’' in position 32: character maps to

Here is where I handle the data:

request = urllib.request.urlopen(url)
content = request.read().decode('utf-8')
jstuff = json.loads(content)

The line I use to print the data as listed in the error above:

print ( "%d. (%d) %s\n" % (i+1, obj['data']['score'], obj['data']['title']))

Can anyone suggest where I might be going wrong?

It's almost certain that you problem has nothing to do with the code you've shown, and can be reproduced in one line:

print(u'\2019')

If your terminal's character set can't handle U+2019 (or if Python is confused about what character set your terminal uses), there's no way to print it out. It doesn't matter whether it comes from JSON or anywhere else.

The Windows terminal (aka "DOS prompt" or "cmd window") is usually configured for a character set like cp1252 that only knows about 256 of the 110000 characters, and there's nothing Python can do about this without a major change to the language implementation.*

See PrintFails on the Python Wiki for details, workarounds, and links to more information. There are also a few hundred dups of this problem on SO (although many of them will be specific to Python 2.x, without mentioning it).


* Windows has a whole separate set of APIs for printing UTF-16 to the terminal, so Python could detect that stdout is a Windows terminal, and if so encode to UTF-16 and use the special APIs instead of encoding to the terminal's charset and using the standard ones. But this raises a bunch of different problems (eg, different ways of printing to stdout getting out of sync). There's been discussion about making these changes, but even if everyone were to agree and the patch were written tomorrow, it still wouldn't help you until you upgrade to whatever future version of Python it's added to…

@N-Saba, what is the string that causes the error to be thrown? In my test case, this looks to be a version-specific bug in python 2.7.3 .

In the feed I was parsing, the "title" field had the following value:

u'title': u'Intel\u2019s Sharp-Eyed Social Scientist'

I get the expected right single quote char when I call either of these, in python 2.7.6 .

python -c "print {u'title': u'Intel\u2019s Sharp-Eyed Social Scientist'}['title']"
Intel’s Sharp-Eyed Social Scientist

In 2.7.3 , I get the error, unless I encode the value that I pulled by KeyName.

print {u'title': u'Intel\u2019s Sharp-Eyed Social Scientist'}['title']
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 5: ordinal not in range(128)
print {u'title': u'Intel\u2019s Sharp-Eyed Social Scientist'}['title'].encode('utf-8', 'replace')
Intel’s Sharp-Eyed Social Scientist

fwiw, the @abamert command print('\’') prints "9". I think the intended code was print(u'\’').

I came across a similar error when attempting to write an API JSON output to a .cav file via pd.DataFrame.to_csv() on a Win install of Python 2.7.14.

Specifying the encoding as utf-8 fixed my process:

pd.DataFrame.to_csv(filename, encoding='utf-8')

For anyone encountering this in macOS, @abarnert's answer is correct and I was able to fix it by putting this at the top of the offending source file:-

# magic to make everything work in Unicode
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

To clarify, this is making sure the terminal output accepts Unicode correctly.

I set IDLE (Python Shell) and Window's CMD default font to Lucida Console (a utf-8 supported font) and these types of errors went away; and you no longer see boxes [][][][][][][][]

:)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM