简体   繁体   中英

Python unicode error. UnicodeEncodeError: 'ascii' codec can't encode character u'\u4e3a'

So, I have this code to fetch JSON string from url

url = 'http://....'
response = urllib2.urlopen(rul)
string = response.read()
data = json.loads(string)

for x in data: 
    print x['foo']

The problem is x['foo'] , if tried to print it as seen above, I get this error.

Warning: Incorrect string value: '\\xE4\\xB8\\xBA Co...' for column 'description' at row 1

If I use x['foo'].decode("utf-8") I get this error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\为' in position 0: ordinal not in range(128)

If I try, encode('ascii', 'ignore').decode('ascii') Then I get this error.

x['foo'].encode('ascii', 'ignore').decode('ascii') AttributeError: 'NoneType' object has no attribute 'encode'

Is there any way to fix this problem?

x['foo'].decode("utf-8") resulting in UnicodeEncodeError means that x['foo'] is of type unicode . str.decode takes a str type and translates it to unicode type. Python 2 is trying to be helpful here and attempts to implicitly convert your unicode to str so that you can call decode on it. It does this with sys.defaultencoding , which is ascii , which can't encode all of Unicode, hence the exception.

The solution here is to remove the decode call - the value is already unicode .

Read Ned Batchelder's presentation - Pragmatic Unicode - it will greatly enhance your understanding of this and help prevent similar errors in the future.

It's worth noting here that everything returned by json.load will be unicode and not str .


Addressing the new question after edits:

When you print , you need bytes - unicode is an abstract concept. You need a mapping from the abstract unicode string into bytes - in python terms, you must convert your unicode object to str . You can do this be calling encode with an encoding that tells it how to translate from the abstract string into concrete bytes. Generally you want to use the utf-8 encoding.

This should work:

print x['foo'].encode('utf-8')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM