简体   繁体   English

用于自定义BNF解析器的任何python模块吗?

[英]Any python module for customized BNF parser?

friends. 朋友们。

I have a 'make'-like style file needed to be parsed. 我需要解析一个类似于“ make”的样式文件。 The grammar is something like: 语法类似于:

samtools=/path/to/samtools
picard=/path/to/picard

task1: 
    des: description
    path: /path/to/task1
    para: [$global.samtools,
           $args.input,
           $path
          ]

task2: task1

Where $global contains the variables defined in a global scope. 其中$global包含在全局范围内定义的变量。 $path is a 'local' variable. $path是一个“本地”变量。 $args contains the key/pair values passed in by users. $args包含用户传递的密钥/对值。

I would like to parse this file by some python libraries. 我想通过一些python库解析此文件。 Better to return some parse tree. 最好返回一些解析树。 If there are some errors, better to report them. 如果有错误,最好报告。 I found this one: CodeTalker and yeanpypa . 我发现这一个: CodeTalkeryeanpypa Can they be used in this case? 在这种情况下可以使用它们吗? Any other recommendations? 还有其他建议吗?

I had to guess what your makefile structure allows based on your example, but this should get you close: 我不得不根据您的示例来猜测您的makefile结构所允许的内容,但这应该可以使您接近:

from pyparsing import *
# elements of the makefile are delimited by line, so we must
# define skippable whitespace to include just spaces and tabs
ParserElement.setDefaultWhitespaceChars(' \t')
NL = LineEnd().suppress()

EQ,COLON,LBRACK,RBRACK = map(Suppress, "=:[]")
identifier = Word(alphas+'_', alphanums)

symbol_assignment = Group(identifier("name") + EQ + empty + 
                          restOfLine("value"))("symbol_assignment")
symbol_ref = Word("$",alphanums+"_.")

def only_column_one(s,l,t):
    if col(l,s) != 1:
        raise ParseException(s,l,"not in column 1")
# task identifiers have to start in column 1
task_identifier = identifier.copy().setParseAction(only_column_one)

task_description = "des:" + empty + restOfLine("des")
task_path = "path:" + empty + restOfLine("path")
task_para_body = delimitedList(symbol_ref)
task_para = "para:" + LBRACK + task_para_body("para") + RBRACK
task_para.ignore(NL)
task_definition = Group(task_identifier("target") + COLON + 
        Optional(delimitedList(identifier))("deps") + NL +
        (
        Optional(task_description + NL) & 
        Optional(task_path + NL) & 
        Optional(task_para + NL)
        )
    )("task_definition")

makefile_parser = ZeroOrMore(
    symbol_assignment |
    task_definition |
    NL
    )


if __name__ == "__main__":
    test = """\
samtools=/path/to/samtools
picard=/path/to/picard

task1:  
    des: description 
    path: /path/to/task1 
    para: [$global.samtools, 
           $args.input, 
           $path 
          ] 

task2: task1 
"""

# dump out what we parsed, including results names
for element in makefile_parser.parseString(test):
    print element.getName()
    print element.dump()
    print

Prints: 印刷品:

symbol_assignment
['samtools', '/path/to/samtools']
- name: samtools
- value: /path/to/samtools

symbol_assignment
['picard', '/path/to/picard']
- name: picard
- value: /path/to/picard

task_definition
['task1', 'des:', 'description ', 'path:', '/path/to/task1 ', 'para:', 
 '$global.samtools', '$args.input', '$path']
- des: description 
- para: ['$global.samtools', '$args.input', '$path']
- path: /path/to/task1 
- target: task1

task_definition
['task2', 'task1']
- deps: ['task1']
- target: task2

The dump() output shows you what names you can use to get at the fields within the parsed elements, or to distinguish what kind of element you have. dump()输出显示您可以使用什么名称来获取已解析元素中的字段,或区分您拥有哪种元素。 dump() is a handy, generic tool to output whatever pyparsing has parsed. dump()是一种方便的通用工具,用于输出pyparsing解析的内容。 Here is some code that is more specific to your particular parser, showing how to use the field names as either dotted object references ( element.target , element.deps , element.name , etc.) or dict-style references ( element[key] ): 这是一些特定于您的特定解析器的代码,展示了如何将字段名称用作点对象引用( element.targetelement.depselement.name等)或dict风格的引用( element[key] ):

for element in makefile_parser.parseString(test):
    if element.getName() == 'task_definition':
        print "TASK:", element.target,
        if element.deps:
            print "DEPS:(" + ','.join(element.deps) + ")"
        else:
            print
        for key in ('des', 'path', 'para'):
            if key in element:
                print " ", key.upper()+":", element[key]

    elif element.getName() == 'symbol_assignment':
        print "SYM:", element.name, "->", element.value

prints: 印刷品:

SYM: samtools -> /path/to/samtools
SYM: picard -> /path/to/picard
TASK: task1
  DES: description 
  PATH: /path/to/task1 
  PARA: ['$global.samtools', '$args.input', '$path']
TASK: task2 DEPS:(task1)

我过去使用过pyparsing ,并对此感到非常满意(qv, pyparsing项目站点 )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM