简体   繁体   中英

Python ASCII and Unicode decode error

I got this very very frustrating error when inserting a certain string into my database. It said something like:

Python cannot decode byte characters, expecting unicode"

After a lot of searching, I saw that I could overcome this error by encoding my string into Unicode . I try to do this by decoding the string first and then encoding it in UTF-8 format. Like:

string = string.encode("utf8")

And I get the following error:

'ascii' codec can't decode byte 0xe3 in position 6: ordinal not in range(128)

I have been dying with this error! How do I fix it?

You need to take a disciplined approach. Pragmatic Unicode, or How Do I Stop The Pain? has everything you need.

If you get that error on that line of code, then the problem is that string is a byte string, and Python 2 is implicitly trying to decode it to Unicode for you. But it isn't pure ascii. You need to know what the encoding is, and decode it properly.

The encode method should be used on unicode objects to convert them to a str object with a given encoding. The decode method should be used on str objects of a given encoding to convert them unicode objects.

I suppose that your database store strings in UTF-8. So when you get strings from the database, convert them to unicode objects by doing str.decode('utf-8') . Then only use unicode objects in your python program (literals are defined with u'unicode string' ). And just before storing them in your database, convert them to str objects with uni.encode('utf-8') .

EDIT: As you can see from the downvotes, this is NOT THE BEST WAY TO DO IT. An excellent, and a highly recommended answer is immediately after this, so if you are looking for a good solution, please use that. This is a hackish solution that will not be kind to you at a later point of time.

I feel your pain, I've had a lot of problems with the same error. The simplest way I solved it (and this might not be the best way, and it depends on your application) was to convert things to unicode, and ignore errors. Here's an example from Unicode HOWTO - Python v2.7.3 documentation

>>> unicode('\x80abc', errors='strict')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0:
                    ordinal not in range(128)
>>> unicode('\x80abc', errors='replace')
u'\ufffdabc'
>>> unicode('\x80abc', errors='ignore')
u'abc'

While this might not be the most expedient method, this is a method that has worked for me.

EDIT:

A couple of people in the comments have mentioned that this is a bad idea, even though the asker accepted the answer. It is NOT a great idea, it will screw things up if you are dealing with european and accented characters. However, this is something you can use if it is NOT production level code, if it is a personal project you are working on, and you need a quick fix to get things rolling. You will eventually need to fix it with the right methods, which are mentioned in the answers below.

The 0xE3 codepoint is an 'a' with a tilde in Unicode. Your original string is most likely already in UTF-8, so you can't decode it using the default ASCII character set.

string in python 2.7 is an ecoded string (encoded in ASCII mostly) but not a character string or unicode.

So when you do string.encode('some encoding') you are actually encoding an encoded string (using some encoding)

Python has to first decode that string using default encoding (ASCII in python 2.7) and then it will further encode. Your string is not encoded in ASCII but some other encoding (UTF8, LATIN-1..), so when python tries to decode this using ASCII, it throws an error because ASCII codec cannot decode few characters in your given string which are out of ASCII range (0 - 127)

#to encode above given string, first decode that using some encoding
decoded_string = string.decode('utf8')
#now encode that decoded string
decoded_string.encode('utf8')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM