So I am getting lost somewhere in converting unicode to utf-8. I am trying to define some JSON containing unicode characters, and writing them to file. When printing to the terminal the character is represented as '\☆'. When having a look at the file the character is encoded to '\☆', note the double backslash. Could someone point me into the right direction regarding these encoding issues?
# encoding=utf8
import json
data = {"summary" : u"This is a unicode character: ☆"}
print data
decoded_data = unicode(data)
print decoded_data
with open('decoded_data.json', 'w') as outfile:
json.dump(decoded_data, outfile)
I tried adding the following snippet to the head of my file, but this had no success neither.
import sys
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)
First you are printing the representation of a dictionary, and python only uses ascii characters and escapes any other character with \\uxxxx
.
The same is with json.dump
trying to only use ascii characters. You can force json.dump
to use unicode with:
json_data = json.dumps(data, ensure_ascii=False)
with open('decoded_data.json', 'w') as outfile:
outfile.write(json_data.encode('utf8'))
I think you can also refer to this link.It is also really useful
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.