简体   繁体   中英

Python encoding unicode<>utf-8

So I am getting lost somewhere in converting unicode to utf-8. I am trying to define some JSON containing unicode characters, and writing them to file. When printing to the terminal the character is represented as '\☆'. When having a look at the file the character is encoded to '\☆', note the double backslash. Could someone point me into the right direction regarding these encoding issues?

# encoding=utf8

import json

data = {"summary" : u"This is a unicode character: ☆"}
print data

decoded_data = unicode(data)
print decoded_data

with open('decoded_data.json', 'w') as outfile:
    json.dump(decoded_data, outfile)

I tried adding the following snippet to the head of my file, but this had no success neither.

import sys
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)

First you are printing the representation of a dictionary, and python only uses ascii characters and escapes any other character with \\uxxxx .

The same is with json.dump trying to only use ascii characters. You can force json.dump to use unicode with:

json_data = json.dumps(data, ensure_ascii=False)
with open('decoded_data.json', 'w') as outfile:
    outfile.write(json_data.encode('utf8'))

I think you can also refer to this link.It is also really useful

Set Default Encoding

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM