简体   繁体   中英

Python: 'ascii' codec can't decode byte

Current code:

 file.write("\"" + key + "\": " + "\"" + french[key].encode('utf8') + "\"" + ',' + '\n')

where french key values in dictionary look like this:

"YOU_HAVE_COMPLETED_ENROLLMENT": "Vous avez termin\u00e9 l'inscription !"

Getting this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 19: ordinal not in range(128)

Tried all the solutions on here but none seem to work.

The solution: Concatenate unicode strings before encoding, then encode the complete string just before writing to a file. The codecs library simplifies this for you.

import codecs

file = codecs.open(os.path.join(fr_directory, 'strings.json'), 'w+', encoding='utf8')
file.write("\"" + key + "\": " + "\"" + french[key] + "\"" + ',' + '\n')

I have opened the file with codecs.open rather than just open , specifying that the file should automatically handle encoding into UTF-8 when you write unicode strings. I have also removed the explicit encoding call you used.

Further explanation:

The keys and values of your dictionary are almost certainly Unicode strings. A "Unicode string" needs to be encoded before it can be written to a file. Most operations in Python 2 assume an ASCII encoding unless told otherwise, and the file objects returned by open are among them. That's why, if you try to write a Unicode string to a file, you'll see an exception:

>>> with open('/tmp/test.txt', 'w') as f:
...    f.write(u"Vous avez termin\xe9 l'inscription !")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 16: ordinal not in range(128)

This error is one that you can fix by encoding the string directly, so this works:

>>> with open('/tmp/test.txt', 'w') as f:
...    f.write(u"Vous avez termin\xe9 l'inscription !".encode('utf-8'))

However, this alone does not solve your problem, because you are trying to build a more complicated string. When you concatenate a Unicode string to a UTF-8 encoded "raw" string, you also get an exception, even when not writing to a file:

>>> u"YOU_HAVE_COMPLETED_ENROLLMENT: " + u"Vous avez termin\xe9 l'inscription !".encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 16: ordinal not in range(128)

You can fix this by not encoding either string:

>>> u"YOU_HAVE_COMPLETED_ENROLLMENT: " + u"Vous avez termin\xe9 l'inscription !"
u"YOU_HAVE_COMPLETED_ENROLLMENT: Vous avez termin\xe9 l'inscription !"

But then when you want to write it to a file, you would have to encode the whole thing again:

>>> with open('/tmp/test.txt', 'w') as f:
...    line = u"YOU_HAVE_COMPLETED_ENROLLMENT: " + u"Vous avez termin\xe9 l'inscription !"
...    f.write(line.encode('utf-8'))

But for convenience, the codecs module gives you the tools to not always have to re-encode every time:

>>> import codecs
>>> with codecs.open('/tmp/test.txt', 'w', encoding='utf8') as f:
...    f.write(u"YOU_HAVE_COMPLETED_ENROLLMENT: " + u"Vous avez termin\xe9 l'inscription !")

you could unicode string using this function

def _parse_value(value):
    if type(value) == str:
        value = value.decode("utf-8", "ignore").strip()
    return value

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM