简体   繁体   中英

Python script failing to correctly encode special Unicode characters

I am converting a text file ( words.txt ) that is basically a dictionary in this format:

good morning, Góðan daginn

into a json file ( converted.json ) in this format

{
    "wordId": 1,
    "word": "good morning",
    "translation": "Góðan daginn"
}

The conversion from a text file to a json file works totally fine and as expected but the characters encoding are kind of messed up and here's how:

for encoding this character ð instead of doing this the script encode that character like this: \Ã\°

Question: How to fix and/or adjust the script so it can correctly encode those special characters? keeping in mind that those characters are mainly Icelandic/Scandinavian and I am using PyCharm as an IDE .

PS Please take into consideration that my Python skills are a bit limited!!

This is the script converter.py :

import json

with open('words.txt', 'r') as f_in, \
    open('converted.json', 'w') as f_out:
cnt = 1
data = []
for line in f_in:
    line = line.split(',')
    if len(line) != 2:
        continue
    d = {"wordId": cnt, "word": line[0].strip(), "translation": line[1].strip()}
    data.append(d)
    cnt += 1

f_out.write(json.dumps(data, indent=4))

I am using Python 3

I believe the problem is that json.dumps , you may need to use ensure_ascii=False . Like:

f_out.write(json.dumps(data, indent=4, ensure_ascii=False))

So basically, as the document says:

If ensure_ascii is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM