简体   繁体   English

在 Python 中解析自定义文本文件

[英]Parse a custom text file in Python

I have a text to be parsed, this is a concise form of the text.我有一段文字要解析,这是文字的简明形式。

apple {
    type=fruit
    varieties {
        color=red
        origin=usa
    }
}

the output should be as shown below输出应如下所示

apple.type=fruit
apple.varieties.color=red
apple.varieties.origin=usa

So far the only thing I have come up with is a sort of breadth-first approach in python.到目前为止,我想出的唯一方法是 Python 中的一种广度优先方法。 But I cant figure out how to get all the children within.但我不知道如何让所有的孩子都进去。

progInput = """apple {
    type=fruit
    varieties {
        color=red
        origin=usa
    }
}
"""
progInputSplitToLines = progInput.split('\n')
childrenList = []
root = ""

def hasChildren():
    if "{" in progInputSplitToLines[0]:
        global root
        root = progInputSplitToLines[0].split(" ")[0]
    for e in progInputSplitToLines[1:]:
        if "=" in e:
            childrenList.append({e.split("=")[0].replace("    ", ""),e.split("=")[1].replace("    ", "")})
hasChildren()

PS: I looked into tree structures in Python and came across anytree ( https://anytree.readthedocs.io/en/latest/ ), do you think it would help in my case? PS:我研究了 Python 中的树结构并遇到了 anytree ( https://anytree.readthedocs.io/en/latest/ ),你认为这对我有帮助吗?

Would you please be able to help me out ?你能帮我一下吗? I'm not very good at parsing text.我不太擅长解析文本。 thanks a bunch in advance.提前感谢一堆。 :) :)

Since your file is in HOCON format, you can try using the pyhocon HOCON parser module to solve your problem.由于您的文件是 HOCON 格式,您可以尝试使用pyhocon HOCON 解析器模块来解决您的问题。

Install: Either run pip install pyhocon , or download the github repo and perform a manual install with python setup.py install .安装:要么运行pip install pyhocon ,要么下载 github repo 并使用python setup.py install执行手动python setup.py install

Basic usage:基本用法:

from pyhocon import ConfigFactory

conf = ConfigFactory.parse_file('text.conf')

print(conf)

Which gives the following nested structure:这给出了以下嵌套结构:

ConfigTree([('apple', ConfigTree([('type', 'fruit'), ('varieties', ConfigTree([('color', 'red'), ('origin', 'usa')]))]))])

ConfigTree is just a collections.OrderedDict() , as seen in the source code . ConfigTree只是一个collections.OrderedDict() ,如源代码所示

UPDATE:更新:

To get your desired output, you can make your own recursive function to collect all paths:要获得所需的输出,您可以创建自己的递归函数来收集所有路径:

from pyhocon import ConfigFactory
from pyhocon.config_tree import ConfigTree

def config_paths(config):
    for k, v in config.items():
        if isinstance(v, ConfigTree):
            for k1, v1 in config_paths(v):
                yield (k,) + k1, v1
        else:
            yield (k,), v

config = ConfigFactory.parse_file('text.conf')
for k, v in config_paths(config):
    print('%s=%s' % ('.'.join(k), v))

Which Outputs:哪些输出:

apple.type=fruit
apple.varieties.color=red
apple.varieties.origin=usa

@RoadRunner How to parse the arrays inside the conf file ? @RoadRunner 如何解析 conf 文件中的数组? for ex:例如:

apple {
    type=fruit
    varieties {
        color=red
        origin=usa
    }
}
loo = [
{
name = "abc"
},
{
name = "xyz"
}
]

I want to print all the name values ie , abc and xyz我想打印所有的名称值,即 abc 和 xyz

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM