简体   繁体   中英

How to parse JSON files with double-quotes inside strings in Python?

I'm trying to read a JSON file in Python. Some of the lines have strings with double quotes inside:

{"Height__c": "8' 0\"", "Width__c": "2' 8\""}

Using a raw string literal produces the right output:

json.loads(r"""{"Height__c": "8' 0\"", "Width__c": "2' 8\""}""")
{u'Width__c': u'2\' 8"', u'Height__c': u'8\' 0"'}

But my string comes from a file, ie:

s = f.readline()

Where:

>>> print repr(s)
'{"Height__c": "8\' 0"", "Width__c": "2\' 8""}'

And json throws the following exception:

json.loads(s) # s = """{"Height__c": "8' 0\"", "Width__c": "2' 8\""}"""
ValueError: Expecting ',' delimiter: line 1 column 21 (char 20)

Also,

>>> s = """{"Height__c": "8' 0\"", "Width__c": "2' 8\""}"""
>>> json.loads(s)

Fails, but assigning the raw literal works:

>>> s = r"""{"Height__c": "8' 0\"", "Width__c": "2' 8\""}"""
>>> json.loads(s)
{u'Width__c': u'2\' 8"', u'Height__c': u'8\' 0"'}

Do I need to write a custom Decoder?

The data file you have does not escape the nested quotes correctly; this can be hard to repair.

If the nested quotes follow a pattern; eg always follow a digit and are the last character in each string you can use a regular expression to fix these up. Given your sample data, if all you have is measurements in feet and inches, that's certainly doable:

import re
from functools import partial

repair_nested = partial(re.compile(r'(\d)""').sub, r'\1\\""')

json.loads(repair_nested(s))

Demo:

>>> import json
>>> import re
>>> from functools import partial
>>> s = '{"Height__c": "8\' 0"", "Width__c": "2\' 8""}'
>>> repair_nested = partial(re.compile(r'(\d)""').sub, r'\1\\""')
>>> json.loads(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/decoder.py", line 365, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/decoder.py", line 381, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 1 column 21 (char 20)
>>> json.loads(repair_nested(s))
{u'Width__c': u'2\' 8"', u'Height__c': u'8\' 0"'}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM