Reg-ex to parse a list with bracketed nested sub lists?

Question

I'm trying to refactor some Python code that parses a string in the following format:

thing_1,thing_2,things_3[thing_31,thing_32,thing_34[thing_341]],thing_5

where the end result is a structured result like:

{
   "thing_1": True,
   "thing_2": True,
   "thing_3": {
      "thing_31": True,
      "thing_32": True,
      "thing_34: {
          "thing_341": True
      }
   },
   "thing_5": True,
}

In practice this is a field list for an API request (where only the given fields are returned) with support for defining required fields for nested objects.

I've been trying various approaches on how to write the reg ex (if that is at all possible). My thought was to parse it first on the brackets' contents, while retaining the first element before each bracket, and in the end I'm left with just the outer top level list. But that proves more difficult to describe in regex than it is to 'say' it in English.

Some notable attempts are below but the grouping is all wrong there.

(([a-z0-9_]+)(\[[a-z0-9,_*]+\]*)+)

([a-z0-9_]+)(\[[a-z0-9,_*]+\]*)

(?<=[a-z0-9_])(\[[a-z0-9,_*]+\]*)

Is this even possible to do in an elegant way?

Thank you!

Answer 1

Since you already have a parser and just wanted to know of an alternative way, you may consider

import json, re
s = "thing_1,thing_2,things_3[thing_31,thing_32,thing_34[thing_341]],thing_5"
s = re.sub(r'\w+(?![\[\w])', r'"\g<0>": true', s)
js = json.loads('{' + re.sub(r'\w+(?=\[)', r'"\g<0>":', s).replace('[', '{').replace(']', '}') + '}')
print (json.dumps(js, indent=4, sort_keys=True))

Output:

{
    "thing_1": true,
    "thing_2": true,
    "thing_5": true,
    "things_3": {
        "thing_31": true,
        "thing_32": true,
        "thing_34": {
            "thing_341": true
        }
    }
}

See the Python demo online .

NOTES:

re.sub(r'\w+(?,[\[\w])': r'"\g<0>", true', s) - wraps all chunks of 1+ word chars that are not immediately followed with [ with double quotation marks, and appends : true after them
re.sub(r'\w+(?=\[)', r'"\g<0>":', s) - wraps all chunks of 1+ word chars that ARE immediately followed with [ with double quotation marks, and appends : after them
.replace('[', '{').replace(']', '}') replaces all [ with { and ] with }
To parse the string as JSON, the result is wrapped with {...}
After parsing with json.loads(s) , json.dumps(js, indent=4, sort_keys=True) pretty-prints the json and dumps it.

Reg-ex to parse a list with bracketed nested sub lists?

Question

1 answers

solution1
1 ACCPTED 2019-11-14 11:57:40

Reg-ex to parse a list with bracketed nested sub lists?

Question

1 answers

solution1 1 ACCPTED 2019-11-14 11:57:40

solution1
1 ACCPTED 2019-11-14 11:57:40