简体   繁体   English

pyparsing中嵌套结构的技巧

[英]the trick to nested structures in pyparsing

I am struggling to parse nested structures with PyParsing. 我正在努力使用PyParsing解析嵌套结构。 I've searched many of the 'nested' example uses of PyParsing , but I don't see how to fix my problem. 我搜索了PyParsing的许多“嵌套”示例用法 ,但是我看不出如何解决我的问题。

Here is what my internal structure looks like: 这是我的内部结构:

texture_unit optionalName
{
    texture required_val
    prop_name1 prop_val1
    prop_name2 prop_val1
}

and here is what my external structure looks like, but it can contain zero or more of the internal structures. 这是我的外部结构,但它可以包含零个或多个内部结构。

pass optionalName
{
    prop_name1 prop_val1
    prop_name2 prop_val1

    texture_unit optionalName
    {
        // edit 2: showing use of '.' character in value
        texture required_val.file.name optional_val // edit 1: forgot this line in initial post.

        // edit 2: showing potentially multiple values
        prop_name3 prop_val1 prop_val2
        prop_name4 prop_val1
    }
}

I am successfully parsing the internal structure. 我已经成功解析了内部结构。 Here is my code for that. 这是我的代码。

prop_ = pp.Group(pp.Word(pp.alphanums+'_')+pp.Group(pp.OneOrMore(pp.Word(pp.alphanums+'_'+'.'))))
texture_props_ = pp.Group(pp.Literal('texture') + pp.Word(pp.alphanums+'_'+'.')) + pp.ZeroOrMore(prop_)
texture_ = pp.Forward()
texture_ << pp.Literal('texture_unit').suppress() + pp.Optional(pp.Word(pp.alphanums+'_')).suppress() + pp.Literal('{').suppress() + texture_props_ + pp.Literal('}').suppress()

Here is my attempt to parse the outer structure, 这是我尝试解析外部结构的尝试,

pass_props_ = pp.ZeroOrMore(prop_)
pass_ = pp.Forward()
pass_ << pp.Literal('pass').suppress() + pp.Optional(pp.Word(pp.alphanums+'_'+'.')).suppress() + pp.Literal('{').suppress() + pass_props_ + pp.ZeroOrMore(texture_) + pp.Literal('}').suppress()

When I say: pass_.parseString( testPassStr ) 当我说:pass_.parseString(testPassStr)

I see errors in the console that "}" was expected. 我在控制台中看到预期为“}”的错误。

I see this as very similar to the C struct example , but I'm not sure what is the missing magic. 我认为这与C结构示例非常相似,但是我不确定缺少的魔术是什么。 I'm also curious how to control the resulting data structure when using the nestedExpr . 我也很好奇在使用nestedExpr时如何控制结果数据结构。

There are two problems: 有两个问题:

  1. In your grammar you marked texture literal as required in texture_unit block, but there is no texture in your second example. 在语法中,您在texture_unit块中将texture文字标记为必需,但是第二个示例中没有texture
  2. In second example, pass_props_ coincides with texture_unit optionalName . 在第二个示例中, pass_props_texture_unit optionalName重合。 After it, pp.Literal('}') expects } , but gives { . 之后, pp.Literal('}')期望} ,但给出{ This is the reason for the error. 这就是错误的原因。

We can check it by changing the pass_ rule like this: 我们可以通过更改pass_规则来检查它:

pass_ << pp.Literal('pass').suppress() + pp.Optional(pp.Word(pp.alphanums+'_'+'.')).suppress() + \
             pp.Literal('{').suppress() + pass_props_

print pass_.parseString(s2)

It gives us follow output: 它给我们以下输出:

[['prop_name', ['prop_val', 'prop_name', 'prop_val', 'texture_unit', 'optionalName']]]

We can see that pass_props_ coincides with texture_unit optionalName . 我们可以看到pass_props_texture_unit optionalName一致。
So, what we want to do: prop_ can contains alphanums , _ and . 因此,我们要做的是: prop_可以包含alphanums_. , but can not match with texture_unit literal. ,但不能与texture_unit文字匹配。 We can do it with regex and negative lookahead : 我们可以使用regex负前瞻来实现

prop_ = pp.Group(  pp.Regex(r'(?!texture_unit)[a-z0-9_]+')+ pp.Group(pp.OneOrMore(pp.Regex(r'(?!texture_unit)[a-z0-9_.]+'))) )

Finally, working example will look like this: 最后,工作示例将如下所示:

import pyparsing as pp

s1 = '''texture_unit optionalName
    {
    texture required_val
    prop_name prop_val
    prop_name prop_val
}'''

prop_ = pp.Group(  pp.Regex(r'(?!texture_unit)[a-z0-9_]+')+ pp.Group(pp.OneOrMore(pp.Regex(r'(?!texture_unit)[a-z0-9_.]+'))) )
texture_props_ = pp.Group(pp.Literal('texture') + pp.Word(pp.alphanums+'_'+'.')) + pp.ZeroOrMore(prop_)
texture_ = pp.Forward()
texture_ = pp.Literal('texture_unit').suppress() + pp.Word(pp.alphanums+'_').suppress() +\
           pp.Literal('{').suppress() + pp.Optional(texture_props_) + pp.Literal('}').suppress()

print texture_.parseString(s1)

s2 = '''pass optionalName
{
    prop_name1 prop_val1.name
    texture_unit optionalName1
    {
        texture required_val1
        prop_name2 prop_val12
        prop_name3 prop_val13
    }
    texture_unit optionalName2
    {
        texture required_va2l
        prop_name2 prop_val22
        prop_name3 prop_val23
    }
}'''

pass_props_ = pp.ZeroOrMore(prop_  )
pass_ = pp.Forward()

pass_ = pp.Literal('pass').suppress() + pp.Optional(pp.Word(pp.alphanums+'_'+'.')).suppress() +\
        pp.Literal('{').suppress() + pass_props_ + pp.ZeroOrMore(texture_ ) + pp.Literal('}').suppress()

print pass_.parseString(s2)

Output: 输出:

[['texture', 'required_val'], ['prop_name', ['prop_val', 'prop_name', 'prop_val']]]
[['prop_name1', ['prop_val1.name']], ['texture', 'required_val1'], ['prop_name2', ['prop_val12', 'prop_name3', 'prop_val13']], ['texture', 'required_va2l'], ['prop_name2', ['prop_val22', 'prop_name3', 'prop_val23']]]

The answer I was looking for is related to the use of the 'Forward' parser, shown in the Cstruct example (linked in OP). 我要寻找的答案与Cstruct示例(在OP中链接)中显示的“ Forward”解析器的使用有关。

The hard part of defining grammar for nested strcture is to define all the possible member types of the structure, which needs to include the structure itself, which is still not defined. 定义嵌套结构语法的难点是定义结构的所有可能的成员类型,这些成员类型需要包括结构本身,而该结构本身仍未定义。

The "trick" to defining the pyparsing grammar for a nested structure is to delay the definition of the structure, but include a "forward declared" version of the structure when defining the structure members, so the members can also include a structure. 定义嵌套结构的pyparsing语法的“技巧”是延迟结构的定义,但是在定义结构成员时包括该结构的“正向声明”版本,因此成员也可以包括结构。 Then complete the structure grammar as a list of members. 然后以成员列表的形式完成结构语法。

struct = Forward()
member = blah | blah2 | struct
struct << ZeroOrMore( Group(member) )

This is also discussed over here: Pyparsing: Parsing semi-JSON nested plaintext data to a list 这里也讨论了这一点: Pyparsing:将半JSON嵌套的纯文本数据解析为列表

The OP (mine) described test data and grammar that was not specific enough and matched when it should have failed. OP(我的)描述的测试数据和语法不够具体,并且在应该失败时进行了匹配。 @NorthCat correctly spotted the undesired matches in the grammar. @NorthCat正确地发现了语法中不需要的匹配项。 However, the suggestion to define many 'negative lookaheads' seemed unmanageable. 但是,定义许多“负先行”的建议似乎难以管理。

Instead of defining what should not match, my solution instead explicitly listed the possible matches. 我的解决方案没有定义不匹配的内容,而是明确列出了可能的匹配项。 The matches were member keywords, using 'oneOf('list of words separated by space'). 匹配项是成员关键字,使用“ oneOf('用空格分隔的单词列表”)。 Once I specified all the possible matches, I realized my structure was not a nested structure, but actually a structure with finite depth and different grammars described each depth. 一旦指定了所有可能的匹配项,我就意识到我的结构不是嵌套结构,而是实际上深度有限且每个深度描述不同语法的结构。 So, my member definition did not require the Forward declaration trick. 因此,我的成员定义不需要Forward声明技巧。

The terminator of my member definitions was different than in the Cstruct example. 我的成员定义的终止符与Cstruct示例不同。 Instead of terminating with a ';' 而不是以';'结尾 (semi-colon) like in C++, my member definitions needed to terminate at the end of the line. (分号),就像在C ++中一样,我的成员定义需要在行尾终止。 In pyparsing, you can specify the end of the line with 'LineEnd' parser. 在pyparsing中,您可以使用'LineEnd'解析器指定行的结尾。 So, I defined my members as a list of values NOT including the 'LineEnd', like this, notice the use of the "Not" (~) operator in the last definition: 因此,我将成员定义为不包含'LineEnd'的值列表,像这样,请注意在最后一个定义中使用了“ Not”(〜)运算符:

EOL = LineEnd().suppress()
ident = Word( alphas+"_", alphanums+"_$@#." )
integer = Word(nums)
real = Combine(Optional(oneOf('+ -')) + Word(nums) + '.' + Optional(Word(nums)))
propVal = real | integer | ident
propList = Group(OneOrMore(~EOL + propVal))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM