简体   繁体   English

正则表达式解析带有括号嵌套子列表的列表?

[英]Reg-ex to parse a list with bracketed nested sub lists?

I'm trying to refactor some Python code that parses a string in the following format:我正在尝试重构一些 Python 代码来解析以下格式的字符串:

thing_1,thing_2,things_3[thing_31,thing_32,thing_34[thing_341]],thing_5

where the end result is a structured result like:最终结果是结构化结果,例如:

{
   "thing_1": True,
   "thing_2": True,
   "thing_3": {
      "thing_31": True,
      "thing_32": True,
      "thing_34: {
          "thing_341": True
      }
   },
   "thing_5": True,
}

In practice this is a field list for an API request (where only the given fields are returned) with support for defining required fields for nested objects.实际上,这是 API 请求的字段列表(仅返回给定字段),支持为嵌套对象定义必需字段。

I've been trying various approaches on how to write the reg ex (if that is at all possible).我一直在尝试各种方法来编写 reg ex(如果可能的话)。 My thought was to parse it first on the brackets' contents, while retaining the first element before each bracket, and in the end I'm left with just the outer top level list.我的想法是首先在括号的内容上解析它,同时保留每个括号之前的第一个元素,最后我只剩下外部顶级列表。 But that proves more difficult to describe in regex than it is to 'say' it in English.但这证明用正则表达式描述比用英语“说”更难。

Some notable attempts are below but the grouping is all wrong there.下面是一些值得注意的尝试,但那里的分组都是错误的。

(([a-z0-9_]+)(\[[a-z0-9,_*]+\]*)+)

([a-z0-9_]+)(\[[a-z0-9,_*]+\]*)

(?<=[a-z0-9_])(\[[a-z0-9,_*]+\]*)

Is this even possible to do in an elegant way?这甚至可以以优雅的方式完成吗?

Thank you!谢谢!

Since you already have a parser and just wanted to know of an alternative way, you may consider由于您已经有一个解析器并且只是想知道另一种方法,您可以考虑

import json, re
s = "thing_1,thing_2,things_3[thing_31,thing_32,thing_34[thing_341]],thing_5"
s = re.sub(r'\w+(?![\[\w])', r'"\g<0>": true', s)
js = json.loads('{' + re.sub(r'\w+(?=\[)', r'"\g<0>":', s).replace('[', '{').replace(']', '}') + '}')
print (json.dumps(js, indent=4, sort_keys=True))

Output: Output:

{
    "thing_1": true,
    "thing_2": true,
    "thing_5": true,
    "things_3": {
        "thing_31": true,
        "thing_32": true,
        "thing_34": {
            "thing_341": true
        }
    }
}

See the Python demo online .在线查看 Python 演示

NOTES:笔记:

  • re.sub(r'\w+(?,[\[\w])': r'"\g<0>", true', s) - wraps all chunks of 1+ word chars that are not immediately followed with [ with double quotation marks, and appends : true after them re.sub(r'\w+(?,[\[\w])': r'"\g<0>", true', s) - 包装所有不紧跟的 1+ 个单词字符块[带双引号,并在它们之后附加: true
  • re.sub(r'\w+(?=\[)', r'"\g<0>":', s) - wraps all chunks of 1+ word chars that ARE immediately followed with [ with double quotation marks, and appends : after them re.sub(r'\w+(?=\[)', r'"\g<0>":', s) - 包装所有 1+ 字字符块,紧跟[带双引号,并附加:在他们之后
  • .replace('[', '{').replace(']', '}') replaces all [ with { and ] with } .replace('[', '{').replace(']', '}')将所有[替换为{]替换为}
  • To parse the string as JSON, the result is wrapped with {...}要将字符串解析为 JSON,结果用{...}
  • After parsing with json.loads(s) , json.dumps(js, indent=4, sort_keys=True) pretty-prints the json and dumps it.使用json.loads(s)解析后, json.dumps(js, indent=4, sort_keys=True)漂亮地打印 json 并转储它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM