在 python 中使用正则表达式解析重复输入

Question

我是 python 的新手，从未使用过正则表达式，我被要求在项目中使用它。 我的输入文件使用以下样式：

tag <itemname> {
    <subitem>
    <subitem> -> possible relationship
    <~subitem> -> this is all irrelevant 
    <more subitems> 
}

用不同的标签和不同长度的不同数据一遍又一遍地重复。 我需要将其转换为 json，并且使用单元测试我已经想出了如何可靠地做到这一点，因为我有其中之一，但我无法弄清楚如何可靠地解析具有数千个结构的文件在一个“标签”之上一次。

基本上，我试图找出如何从文件中重复读取第一行（项目名称）和以下两个花括号之间的所有内容，并理想地将其转换为我可以使用的可迭代形式。 谁能给我一些建议？

Answer 1

如果你有这样的字符串-

tag <itemname> {
    <subitem>
    <subitem> -> possible relationship
    <~subitem> -> this is all irrelevant 
    <more subitems> 
} 

tag <itemname> {
    <subitem>
    <subitem> -> possible relationship
    <~subitem> -> this is all irrelevant
    <more subitems>
    <more subitems>
}

（可能还有更多标签）

你只需要每个标签的列表。

您可以使用 - (tag.+ {\n(?:.+\n)*?})

在这里查看演示

你的代码看起来像 -

s = """tag <itemname> {
    <subitem>
    <subitem> -> possible relationship
    <~subitem> -> this is all irrelevant 
    <more subitems> 
} 

tag <itemname> {
    <subitem>
    <subitem> -> possible relationship
    <~subitem> -> this is all irrelevant
    <more subitems>
    <more subitems>
}
"""

tags = re.findall(r'(tag .+ {\n(?:.+\n)*?})', s)

# Just to test out the tags
for tag in tags:
    print(tag)

现在您可以对每个标签运行自己的解析。

在 python 中使用正则表达式解析重复输入

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-06-18 14:48:48

在 python 中使用正则表达式解析重复输入

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-06-18 14:48:48

解决方案1
0 已采纳 2020-06-18 14:48:48