简体   繁体   中英

Remove quotation markers between quotation marks

I have a string in json formation like that

{"1":"abc"abc"abc","2":"xyz"xyz"xyz"}

But if I want to tranform it into json data, I need to remove '"' between '"' and get a string like below

{"1":"abcabcabc","2":"xyzxyzxyz"}

I tried using re.sub to do that, but failed. Anyone could help me with that? My script is below:

a='{"1":"abc"de"fg","2":"xyz"xyz"xyz"}'
r = re.compile(r'(?<!\:)(?<=.+)"|(?<!,)"|"(?!}|,)')
b = r.sub('', a)
print(b)

When I ran the script, the outcome is below:

Traceback (most recent call last):
  File "./_t1.py", line 5, in <module>
    r = re.compile(r'(?<!\:)(?<=.+)"|(?<!,)"|"(?!}|,)')
  File "/home/emc/ssd/anaconda3/lib/python3.6/re.py", line 233, in compile
    return _compile(pattern, flags)
  File "/home/emc/ssd/anaconda3/lib/python3.6/re.py", line 301, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/home/emc/ssd/anaconda3/lib/python3.6/sre_compile.py", line 566, in compile
    code = _code(p, flags)
  File "/home/emc/ssd/anaconda3/lib/python3.6/sre_compile.py", line 551, in _code
    _compile(code, p.data, flags)
  File "/home/emc/ssd/anaconda3/lib/python3.6/sre_compile.py", line 187, in _compile
    _compile(code, av, flags)
  File "/home/emc/ssd/anaconda3/lib/python3.6/sre_compile.py", line 160, in _compile
    raise error("look-behind requires fixed-width pattern")
sre_constants.error: look-behind requires fixed-width pattern

That works if your data doesn't contain , or : because we need some anchors to untangle this mess:

import re

a='{"1":"abc"de"fg","2":"xyz"xyz"xyz"}'

b = re.sub('"((?:[^,:]|")*)"',lambda m : '"{}"'.format(m.group(1).replace('"','')),a)

>>> b
'{"1":"abcdefg","2":"xyzxyzxyz"}'
  • regex matches the string between quotes and replacement function removes the inner quotes.
  • we create an inner non-capturing (?:[^,:]|") group to tell to match quotes or anything but comma and colon.

now b can be parsed as json:

>>> import json
>>> json.loads(b)
{'1': 'abcdefg', '2': 'xyzxyzxyz'}

now what if the string contains : ? the solution above doesn't work. We have to adapt it:

  • split according to ":" (with possible spaces)
  • apply a similar regex as above (with just the first quote removed) on all elements of the split list
  • join back the elements with ":"

like this:

import re,json

# a lot of colons in keys & values
a='{"1":"a:bc"de"fg","2:":"xy::z"xyz"xyz"}'

b = '":"'.join(re.sub('((?:[^,:]|")*)"',lambda m : '{}"'.format(m.group(1).replace('"','')),x) for x in re.split('"\s*:\s*"',a))

print(json.loads(b))

Results in proper parsing of json :

{'1': 'a:bcdefg', '2:': 'xy::zxyzxyz'}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM