简体   繁体   中英

ast.literal_eval will convert unicode code point from \uxxxx to \\uxxxx, how to avoid?

For example, here is the code to handle this json file

json.loads(u"\"{\\\"title\\\": \\\"\\\\u5927\\\"}\"")

json.loads will convert it to a unicode string, see below

{"title": "\u5927"}

here is the code to handle unicode string

ast.literal_eval(json.loads(u"\"{\\\"title\\\": \\\"\\\\u5927\\\"}\""))

ast.literal_eval will convert it to a dictionary, see below

{'title': '\\u5927'}

But what I want is a dictionary with below content

{'title': '\u5927'}

json.loads("{\\"title\\": \\"\\\大\\"}") will return a dictionary, so you don't need the ast.literal_eval at all.

d = json.loads("{\"title\": \"\\u5927\"}")

print d
{u'title': u'\u5927'}

type(d)
Out[2]: dict

For the full json.loads() json to python conversion, please see this .

If you're trying to parse a file, use json.load() without the s like this:

with open('your-file.json') as f:
    # you can change the encoding to the one you need
    print json.load(f, encoding='utf-8')

Test:

from io import StringIO

s = StringIO(u"{\"title\": \"\\u5927\"}")

print json.load(s)
{u'title': u'\u5927'}

Update

OP has totally changed what the json should be parsed, here is another solution, parse the json again:

json.loads(json.loads(u"\"{\\\"title\\\": \\\"\\\\u5927\\\"}\""))
Out[6]: {u'title': u'\u5927'}

This is because the first json.loads convert the string (non-json) to a json string, parse it again with json.loads will deserialize it eventually.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM