简体   繁体   中英

How to decode an invalid json string in python

I wonder if there is a way to decode a JSON-like string.

I got string:

'{ hotel: { id: "123", name: "hotel_name"} }'

It's not a valid JSON string, so I can't decode it directly with the python API. Python will only accept a stringified JSON string like:

 '{ "hotel": { "id": "123", "name": "hotel_name"} }'

where properties are quoted to be a string.

Use demjson module, which has ability to decode in non-strict mode.

In [1]: import demjson
In [2]: demjson.decode('{ hotel: { id: "123", name: "hotel_name"} }')
Out[2]: {u'hotel': {u'id': u'123', u'name': u'hotel_name'}}

You could try and use a wrapper for a JavaScript engine, like pyv8 .

import PyV8
ctx = PyV8.JSContext()
ctx.enter()
# Note that we need to insert an assignment here ('a ='), or syntax error.
js = 'a = ' + '{ hotel: { id: "123", name: "hotel_name"} }'
a = ctx.eval(js)
a.hotel.id
>> '123' # Prints

@vartec has already pointed out demjson , which works well for slightly invalid JSON. For data that's even less JSON compliant I've written barely_json :

from barely_json import parse
print(parse('[no, , {complete: yes, where is my value?}]'))

prints

[False, '', {'complete': True, 'where is my value?': ''}]

Not very elegant and not robust (and easy to break), but it may be possible to kludge it with something like:

kludged = re.sub('(?i)([a-z_].*?):', r'"\1":', string)
# { "hotel": { "id": "123", "name": "hotel_name"} }

You may find that using pyparsing and the parsePythonValue.py example could do what you want as well... (or modified fairly easily to do so) or the jsonParser.py could be modified to not require quoted key values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM