I'm trying to refactor some Python code that parses a string in the following format:
thing_1,thing_2,things_3[thing_31,thing_32,thing_34[thing_341]],thing_5
where the end result is a structured result like:
{
"thing_1": True,
"thing_2": True,
"thing_3": {
"thing_31": True,
"thing_32": True,
"thing_34: {
"thing_341": True
}
},
"thing_5": True,
}
In practice this is a field list for an API request (where only the given fields are returned) with support for defining required fields for nested objects.
I've been trying various approaches on how to write the reg ex (if that is at all possible). My thought was to parse it first on the brackets' contents, while retaining the first element before each bracket, and in the end I'm left with just the outer top level list. But that proves more difficult to describe in regex than it is to 'say' it in English.
Some notable attempts are below but the grouping is all wrong there.
(([a-z0-9_]+)(\[[a-z0-9,_*]+\]*)+)
([a-z0-9_]+)(\[[a-z0-9,_*]+\]*)
(?<=[a-z0-9_])(\[[a-z0-9,_*]+\]*)
Is this even possible to do in an elegant way?
Thank you!
Since you already have a parser and just wanted to know of an alternative way, you may consider
import json, re
s = "thing_1,thing_2,things_3[thing_31,thing_32,thing_34[thing_341]],thing_5"
s = re.sub(r'\w+(?![\[\w])', r'"\g<0>": true', s)
js = json.loads('{' + re.sub(r'\w+(?=\[)', r'"\g<0>":', s).replace('[', '{').replace(']', '}') + '}')
print (json.dumps(js, indent=4, sort_keys=True))
Output:
{
"thing_1": true,
"thing_2": true,
"thing_5": true,
"things_3": {
"thing_31": true,
"thing_32": true,
"thing_34": {
"thing_341": true
}
}
}
See the Python demo online .
NOTES:
re.sub(r'\w+(?,[\[\w])': r'"\g<0>", true', s)
- wraps all chunks of 1+ word chars that are not immediately followed with [
with double quotation marks, and appends : true
after them re.sub(r'\w+(?=\[)', r'"\g<0>":', s)
- wraps all chunks of 1+ word chars that ARE immediately followed with [
with double quotation marks, and appends :
after them .replace('[', '{').replace(']', '}')
replaces all [
with {
and ]
with }
{...}
json.loads(s)
, json.dumps(js, indent=4, sort_keys=True)
pretty-prints the json and dumps it.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.