I am converting a text file ( words.txt
) that is basically a dictionary in this format:
good morning, Góðan daginn
into a json file ( converted.json
) in this format
{
"wordId": 1,
"word": "good morning",
"translation": "Góðan daginn"
}
The conversion from a text file to a json file works totally fine and as expected but the characters encoding are kind of messed up and here's how:
for encoding this character ð
instead of doing this \ð
the script encode that character like this: \Ã\°
Question: How to fix and/or adjust the script so it can correctly encode those special characters? keeping in mind that those characters are mainly Icelandic/Scandinavian and I am using PyCharm as an IDE .
PS Please take into consideration that my Python skills are a bit limited!!
This is the script converter.py :
import json
with open('words.txt', 'r') as f_in, \
open('converted.json', 'w') as f_out:
cnt = 1
data = []
for line in f_in:
line = line.split(',')
if len(line) != 2:
continue
d = {"wordId": cnt, "word": line[0].strip(), "translation": line[1].strip()}
data.append(d)
cnt += 1
f_out.write(json.dumps(data, indent=4))
I am using Python 3
I believe the problem is that json.dumps
, you may need to use ensure_ascii=False
. Like:
f_out.write(json.dumps(data, indent=4, ensure_ascii=False))
So basically, as the document says:
If ensure_ascii is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.