简体   繁体   English

从字符串解析python嵌套列表

[英]Parse python nested list from string

So I parse a file to a python lists and I encountered a nested list like this: 所以我将文件解析为python列表,然后遇到了这样的嵌套列表:

{   1   4{  2a  0.0 }{  3   0.0 }{  4c  0.0 }{  5   0.0 }   }

I want to interpret it as a list, yet nested, so I want it to be the python list as follows: 我想将其解释为一个列表,但仍嵌套,因此我希望它成为python列表,如下所示:

[1,4,[2a,0.0],[3,0.0],[4c,0.0],[5,0.0]]

I manage to do a correct string of this with a following: 我设法做到以下正确的字符串:

l = """{    1   4{  2   0.0 }{  3   0.0 }{  4   0.0 }{  5   0.0 }   }"""
l = l.replace("{\t",",[").replace("\t}","]").replace("{","[").replace("}","]").replace("\t",",")[1:]

I can also apply ' l.strip("\\t") so that it is a list, but not for a nested, otherwise it will be flattened, which I do not want. 我也可以应用' l.strip("\\t") ,使它成为一个列表,而不是嵌套的,否则它将被展平,这是我不想要的。

I tried with ast.literal_eval(l) , but it fails on strings eg 2a 我尝试了ast.literal_eval(l) ,但是它在字符串如2a上失败

Pyparsing has a built-in helper nestedExpr to help parse nested lists between opening and closing delimiters: Pyparsing具有内置的帮助程序nestedExpr ,可帮助解析打开和关闭定界符之间的嵌套列表:

>>> import pyparsing as pp
>>> nested_braces = pp.nestedExpr('{', '}')
>>> t = """{   1   4{  2a  0.0 }{  3   0.0 }{  4c  0.0 }{  5   0.0 }   }"""
>>> print(nested_braces.parseString(t).asList())
[['1', '4', ['2a', '0.0'], ['3', '0.0'], ['4c', '0.0'], ['5', '0.0']]]

You can develop your own parser using RegEx. 您可以使用RegEx开发自己的解析器。 In your situation, it is not too difficult. 根据您的情况,这不太困难。 You can parse the enclosing curly brackets, then split the items and evaluate each item recursively. 您可以解析包围的大括号,然后拆分项目并递归评估每个项目。

Here is an example (which is not perfect): 这是一个示例(并不完美):

import re

RE_BRACE = r"\{.*\}"
RE_ITEM = r"\d+[a-z]+"
RE_FLOAT = r"[-+]?\d*\.\d+"
RE_INT = r"\d+"

find_all_items = re.compile(
    "|".join([RE_BRACE, RE_ITEM, RE_FLOAT, RE_INT]),
    flags=re.DOTALL).findall

def parse(text):
    mo = re.match(RE_BRACE, text, flags=re.DOTALL)
    if mo:
        content = mo.group()[1:-1]
        items = [parse(part) for part in find_all_items(content)]
        return items
    mo = re.match(RE_ITEM, text, flags=re.DOTALL)
    if mo:
        return mo.group()
    mo = re.match(RE_FLOAT, text, flags=re.DOTALL)
    if mo:
        return float(mo.group())
    mo = re.match(RE_INT, text, flags=re.DOTALL)
    if mo:
        return int(mo.group())
    raise Exception("Invalid text: {0}".format(text))

note: this parser cannot parse {1 {2} {3} 4} the right way. 注意:此解析器无法正确解析{1 {2} {3} 4} You need a recursive parser like pyparsing for that. 您需要一个像pyparsing这样的递归解析器。

Demo: 演示:

s = '''{   1   4{  2a  0.0 }{  3   0.0 }{  4c  0.0 }{  5   0.0 }   }'''

l = parse(s)
print(l)

You get: 你得到:

[1, 4, ['2a', 0.0, [3, 0.0, '4c', 0.0], 5, 0.0]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM