Current code:
file.write("\"" + key + "\": " + "\"" + french[key].encode('utf8') + "\"" + ',' + '\n')
where french key values in dictionary look like this:
"YOU_HAVE_COMPLETED_ENROLLMENT": "Vous avez termin\u00e9 l'inscription !"
Getting this error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 19: ordinal not in range(128)
Tried all the solutions on here but none seem to work.
The solution: Concatenate unicode strings before encoding, then encode the complete string just before writing to a file. The codecs
library simplifies this for you.
import codecs
file = codecs.open(os.path.join(fr_directory, 'strings.json'), 'w+', encoding='utf8')
file.write("\"" + key + "\": " + "\"" + french[key] + "\"" + ',' + '\n')
I have opened the file with codecs.open
rather than just open
, specifying that the file should automatically handle encoding into UTF-8 when you write unicode strings. I have also removed the explicit encoding call you used.
The keys and values of your dictionary are almost certainly Unicode strings. A "Unicode string" needs to be encoded before it can be written to a file. Most operations in Python 2 assume an ASCII encoding unless told otherwise, and the file objects returned by open
are among them. That's why, if you try to write a Unicode string to a file, you'll see an exception:
>>> with open('/tmp/test.txt', 'w') as f:
... f.write(u"Vous avez termin\xe9 l'inscription !")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 16: ordinal not in range(128)
This error is one that you can fix by encoding the string directly, so this works:
>>> with open('/tmp/test.txt', 'w') as f:
... f.write(u"Vous avez termin\xe9 l'inscription !".encode('utf-8'))
However, this alone does not solve your problem, because you are trying to build a more complicated string. When you concatenate a Unicode string to a UTF-8 encoded "raw" string, you also get an exception, even when not writing to a file:
>>> u"YOU_HAVE_COMPLETED_ENROLLMENT: " + u"Vous avez termin\xe9 l'inscription !".encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 16: ordinal not in range(128)
You can fix this by not encoding either string:
>>> u"YOU_HAVE_COMPLETED_ENROLLMENT: " + u"Vous avez termin\xe9 l'inscription !"
u"YOU_HAVE_COMPLETED_ENROLLMENT: Vous avez termin\xe9 l'inscription !"
But then when you want to write it to a file, you would have to encode the whole thing again:
>>> with open('/tmp/test.txt', 'w') as f:
... line = u"YOU_HAVE_COMPLETED_ENROLLMENT: " + u"Vous avez termin\xe9 l'inscription !"
... f.write(line.encode('utf-8'))
But for convenience, the codecs
module gives you the tools to not always have to re-encode every time:
>>> import codecs
>>> with codecs.open('/tmp/test.txt', 'w', encoding='utf8') as f:
... f.write(u"YOU_HAVE_COMPLETED_ENROLLMENT: " + u"Vous avez termin\xe9 l'inscription !")
you could unicode string using this function
def _parse_value(value):
if type(value) == str:
value = value.decode("utf-8", "ignore").strip()
return value
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.