简体   繁体   English

解析现有的配置文件

[英]Parsing an existing config file

I have a config file that is in the following form: 我有一个以下形式的配置文件:

protocol sample_thread {
    { AUTOSTART 0 }
    { BITMAP thread.gif }
    { COORDS {0 0} }
    { DATAFORMAT {
        { TYPE hl7 }
        { PREPROCS {
            { ARGS {{}} }
            { PROCS sample_proc }
        } }
    } } 
}

The real file may not have these exact fields, and I'd rather not have to describe the the structure of the data is to the parser before it parses. 真实文件可能没有这些确切的字段,我宁愿在解析之前不必描述数据的结构是解析器。

I've looked for other configuration file parsers, but none that I've found seem to be able to accept a file of this syntax. 我已经找了其他配置文件解析器,但我发现没有人能够接受这种语法的文件。

I'm looking for a module that can parse a file like this, any suggestions? 我正在寻找一个可以解析这样的文件的模块,有什么建议吗?

If anyone is curious, the file in question was generated by Quovadx Cloverleaf. 如果有人好奇,有问题的文件是由Quovadx Cloverleaf生成的。

pyparsing is pretty handy for quick and simple parsing like this. 对于像这样的快速简单的解析, pyparsing非常方便。 A bare minimum would be something like: 最低限度的是:

import pyparsing
string = pyparsing.CharsNotIn("{} \t\r\n")
group = pyparsing.Forward()
group << pyparsing.Group(pyparsing.Literal("{").suppress() + 
                         pyparsing.ZeroOrMore(group) + 
                         pyparsing.Literal("}").suppress()) 
        | string

toplevel = pyparsing.OneOrMore(group)

The use it as: 用它作为:

>>> toplevel.parseString(text)
['protocol', 'sample_thread', [['AUTOSTART', '0'], ['BITMAP', 'thread.gif'], 
['COORDS', ['0', '0']], ['DATAFORMAT', [['TYPE', 'hl7'], ['PREPROCS', 
[['ARGS', [[]]], ['PROCS', 'sample_proc']]]]]]]

From there you can get more sophisticated as you want (parse numbers seperately from strings, look for specific field names etc). 从那里你可以根据需要变得更复杂(从字符串中单独解析数字,查找特定的字段名称等)。 The above is pretty general, just looking for strings (defined as any non-whitespace character except "{" and "}") and {} delimited lists of strings. 以上是非常通用的,只是查找字符串(定义为除“{”和“}”之外的任何非空格字符)和{}分隔的字符串列表。

Taking Brian's pyparsing solution another step, you can create a quasi-deserializer for this format by using the Dict class: 将Brian的pyparsing解决方案再迈出一步,您可以使用Dict类为此格式创建一个准解串器:

import pyparsing

string = pyparsing.CharsNotIn("{} \t\r\n")
# use Word instead of CharsNotIn, to do whitespace skipping
stringchars = pyparsing.printables.replace("{","").replace("}","")
string = pyparsing.Word( stringchars )
# define a simple integer, plus auto-converting parse action
integer = pyparsing.Word("0123456789").setParseAction(lambda t : int(t[0]))
group = pyparsing.Forward()
group << ( pyparsing.Group(pyparsing.Literal("{").suppress() +
    pyparsing.ZeroOrMore(group) +
    pyparsing.Literal("}").suppress())
    | integer | string )

toplevel = pyparsing.OneOrMore(group)

sample = """
protocol sample_thread {
    { AUTOSTART 0 }
    { BITMAP thread.gif }
    { COORDS {0 0} }
    { DATAFORMAT {
        { TYPE hl7 }
        { PREPROCS {
            { ARGS {{}} }
            { PROCS sample_proc }
        } }
    } } 
    }
"""

print toplevel.parseString(sample).asList()

# Now define something a little more meaningful for a protocol structure, 
# and use Dict to auto-assign results names
LBRACE,RBRACE = map(pyparsing.Suppress,"{}")
protocol = ( pyparsing.Keyword("protocol") + 
             string("name") + 
             LBRACE + 
             pyparsing.Dict(pyparsing.OneOrMore(
                pyparsing.Group(LBRACE + string + group + RBRACE)
                ) )("parameters") + 
             RBRACE )

results = protocol.parseString(sample)
print results.name
print results.parameters.BITMAP
print results.parameters.keys()
print results.dump()

Prints 打印

['protocol', 'sample_thread', [['AUTOSTART', 0], ['BITMAP', 'thread.gif'], ['COORDS', 

[0, 0]], ['DATAFORMAT', [['TYPE', 'hl7'], ['PREPROCS', [['ARGS', [[]]], ['PROCS', 'sample_proc']]]]]]]
sample_thread
thread.gif
['DATAFORMAT', 'COORDS', 'AUTOSTART', 'BITMAP']
['protocol', 'sample_thread', [['AUTOSTART', 0], ['BITMAP', 'thread.gif'], ['COORDS', [0, 0]], ['DATAFORMAT', [['TYPE', 'hl7'], ['PREPROCS', [['ARGS', [[]]], ['PROCS', 'sample_proc']]]]]]]
- name: sample_thread
- parameters: [['AUTOSTART', 0], ['BITMAP', 'thread.gif'], ['COORDS', [0, 0]], ['DATAFORMAT', [['TYPE', 'hl7'], ['PREPROCS', [['ARGS', [[]]], ['PROCS', 'sample_proc']]]]]]
  - AUTOSTART: 0
  - BITMAP: thread.gif
  - COORDS: [0, 0]
  - DATAFORMAT: [['TYPE', 'hl7'], ['PREPROCS', [['ARGS', [[]]], ['PROCS', 'sample_proc']]]]

I think you will get further faster with pyparsing. 我认为通过pyparsing你会更快。

-- Paul - 保罗

I'll try and answer what I think is the missing question(s)... 我会尝试回答我认为缺失的问题......

Configuration files come in many formats. 配置文件有多种格式。 There are well known formats such as *.ini or apache config - these tend to have many parsers available. 有众所周知的格式,如* .ini或apache配置 - 这些格式往往有许多解析器可用。

Then there are custom formats. 然后有自定义格式。 That is what yours appears to be (it could be some well-defined format you and I have never seen before - but until you know what that is it doesn't really matter). 这就是你的看法(它可能是你和我以前从未见过的一些定义明确的格式 - 但直到你知道它是什么并不重要)。

I would start with the software this came from and see if they have a programming API that can load/produce these files. 我将从它来自的软件开始,看看他们是否有可以加载/生成这些文件的编程API。 If nothing is obvious give Quovadx a call. 如果没有什么是显而易见的,请给Quovadx打电话。 Chances are someone has already solved this problem. 有可能有人已经解决了这个问题。

Otherwise you're probably on your own to create your own parser. 否则你可能会自己创建自己的解析器。

Writing a parser for this format would not be terribly difficult assuming that your sample is representative of a complete example. 假设您的样本代表完整示例,为此格式编写解析器并不是非常困难。 It's a hierarchy of values where each node can contain either a value or a child hierarchy of values. 它是值的层次结构,其中每个节点可以包含值的值或子层次结构。 Once you've defined the basic types that the values can contain the parser is a very simple structure. 一旦定义了值可以包含的基本类型,解析器就是一个非常简单的结构。

You could write this reasonably quickly using something like Lex/Flex or just a straight-forward parser in the language of your choosing. 你可以使用像Lex / Flex这样的东西或者用你选择的语言中的直接解析器来快速写出来。

您可以轻松地在python中编写脚本,将其转换为python dict,格式看起来几乎像分层名称值对,只有问题似乎是Coards {0 0},其中{0 0}不是名称值对,但是一个列表,所以谁知道其他这样的情况是什么格式我认为你最好的选择是有该格式的规范,并编写一个简单的python脚本来阅读它。

Your config file is very similar to JSON (pretty much, replace all your "{" and "}" with "[" and "]"). 您的配置文件与JSON非常相似(几乎用“[”和“]”替换所有“{”和“}”)。 Most languages have a built in JSON parser (PHP, Ruby, Python, etc), and if not, there are libraries available to handle it for you. 大多数语言都有内置的JSON解析器(PHP,Ruby,Python等),如果没有,可以使用库来处理它。

If you can not change the format of the configuration file, you can read all file contents as a string, and replace all the "{" and "}" characters via whatever means you prefer. 如果您无法更改配置文件的格式,则可以将所有文件内容作为字符串读取,并通过您喜欢的任何方式替换所有“{”和“}”字符。 Then you can parse the string as JSON, and you're set. 然后你可以将字符串解析为JSON,然后就可以了。

I searched a little on the Cheese Shop , but I didn't find anything helpful for your example. 我在奶酪店搜索了一下,但我找不到任何有用的例子。 Check the Examples page, and this specific parser ( it's syntax resembles yours a bit ). 查看Examples页面,以及这个特定的解析器(它的语法类似于你的一些)。 I think this should help you write your own. 我认为这应该可以帮助你自己写。

Look into LEX and YACC . 看看LEX和YACC A bit of a learning curve, but they can generate parsers for any language. 一点学习曲线,但他们可以为任何语言生成解析器。

Maybe you could write a simple script that will convert your config into xml file and then read it just using lxml, Beatuful Soup or anything else? 也许您可以编写一个简单的脚本,将您的配置转换为xml文件,然后使用lxml,Beatuful Soup或其他任何东西读取它? And your converter could use PyParsing or regular expressions for example. 例如,您的转换器可以使用PyParsing或正则表达式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM