简体   繁体   English

Pyparsing / Python二进制布尔表达式到XML嵌套问题(2.7.10)

[英]Pyparsing/Python Binary Boolean Expression to XML Nesting Issue (2.7.10)

I need to parse nested binary boolean expressions into an XML tree. 我需要将嵌套的二进制布尔表达式解析为XML树。 For example take the expression 例如以表达式

expression2 =  "((Param1 = 1 AND Param2 = 1 ) \
            OR (Param3 = 1 AND Param4 = 1)) \
            AND \
            (((Param5 = 0 AND Param6 = 1 )  \
            OR(Param7 = 0 AND Param8 = 1)) \
            AND \
            ((Param9 = 0 AND Param10 = 1 )  \
            OR(Param11 = 0 AND Param12 = 1)))"

which is essentially a combination of (Expression) (Operator) (Expression) terms. 本质上是(Expression) (Operator) (Expression)项的组合。

I need the output to be a combination of these expressions with proper tags in XML. 我需要输出是这些表达式与XML中适当标签的组合。 aka 又名

<MainBody>
          <FirstExpression>
            Parameter
          </FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>
            1
          </SecondExpression>
        </MainBody>

where firstexpression can be a parameter or a mainbody (here is the nesting), operator is always =, <, >, AND, OR, and secondexpression is either an integer or a mainbody 其中firstexpression可以是参数或主体(这里是嵌套),operator始终为=,<,>,AND,OR,并且secondexpression是整数或主体

There will always be groups of three - aka the smallest discrete object will consist of the firstexpression the operator and the second expression. 总会有三组-aka最小的离散对象将由运算符的第一个表达式和第二个表达式组成。

The code I've come up with (This is my first time using python) gets me somewhat there. 我提出的代码(这是我第一次使用python)使我有所了解。

import pyparsing as pp
import xml.etree.ElementTree as ET


operator = pp.Regex(">=|<=|!=|>|<|=").setName("operator").setResultsName("Operator")
number = pp.Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?").setResultsName("SecondExpression")
identifier = pp.Word(pp.alphas, pp.alphanums + "_" + ".").setName("FirstExpression").setResultsName("FirstExpression")
comparison_term = identifier | number
condition = pp.Group(comparison_term + operator + comparison_term).setResultsName("MainBody")


expr = pp.operatorPrecedence(condition,[
                            ("NOT", 1, pp.opAssoc.RIGHT, ),
                            ("AND", 2, pp.opAssoc.LEFT, ),
                            ("OR", 2, pp.opAssoc.LEFT, ),
                            ])


expression2 =  "((Param1 = 1 AND Param2 = 1 ) \
                OR (Param3 = 1 AND Param4 = 1)) \
                AND \
                (((Param5 = 0 AND Param6 = 1 )  \
                OR(Param7 = 0 AND Param8 = 1)) \
                AND \
                ((Param9 = 0 AND Param10 = 1 )  \
                OR(Param11 = 0 AND Param12 = 1)))"



out = expr.parseString(expression2)
text = out.asXML()

f = open('rules.xml','w+')
f.write(text) 
f.close()

root = ET.parse("rules.xml").getroot()

print ET.tostring(root)

This outputs XML of this form: 这将输出这种形式的XML:

<ITEM>
  <ITEM>
    <ITEM>
      <MainBody>
        <MainBody>
          <FirstExpression>Param1</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>1</SecondExpression>
        </MainBody>
        <ITEM>AND</ITEM>
        <MainBody>
          <FirstExpression>Param2</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>1</SecondExpression>
        </MainBody>
      </MainBody>
      <ITEM>OR</ITEM>
      <MainBody>
        <MainBody>
          <FirstExpression>Param3</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>1</SecondExpression>
        </MainBody>
        <ITEM>AND</ITEM>
        <MainBody>
          <FirstExpression>Param4</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>1</SecondExpression>
        </MainBody>
      </MainBody>
    </ITEM>
    <ITEM>AND</ITEM>
    <ITEM>
      <ITEM>
        <MainBody>
          <MainBody>
            <FirstExpression>Param5</FirstExpression>
            <Operator>=</Operator>
            <SecondExpression>0</SecondExpression>
          </MainBody>
          <ITEM>AND</ITEM>
          <MainBody>
            <FirstExpression>Param6</FirstExpression>
            <Operator>=</Operator>
            <SecondExpression>1</SecondExpression>
          </MainBody>
        </MainBody>
        <ITEM>OR</ITEM>
        <MainBody>
          <MainBody>
            <FirstExpression>Param7</FirstExpression>
            <Operator>=</Operator>
            <SecondExpression>0</SecondExpression>
          </MainBody>
          <ITEM>AND</ITEM>
          <MainBody>
            <FirstExpression>Param8</FirstExpression>
            <Operator>=</Operator>
            <SecondExpression>1</SecondExpression>
          </MainBody>
        </MainBody>
      </ITEM>
      <ITEM>AND</ITEM>
      <ITEM>
        <MainBody>
          <MainBody>
            <FirstExpression>Param9</FirstExpression>
            <Operator>=</Operator>
            <SecondExpression>0</SecondExpression>
          </MainBody>
          <ITEM>AND</ITEM>
          <MainBody>
            <FirstExpression>Param10</FirstExpression>
            <Operator>=</Operator>
            <SecondExpression>1</SecondExpression>
          </MainBody>
        </MainBody>
        <ITEM>OR</ITEM>
        <MainBody>
          <MainBody>
            <FirstExpression>Param11</FirstExpression>
            <Operator>=</Operator>
            <SecondExpression>0</SecondExpression>
          </MainBody>
          <ITEM>AND</ITEM>
          <MainBody>
            <FirstExpression>Param12</FirstExpression>
            <Operator>=</Operator>
            <SecondExpression>1</SecondExpression>
          </MainBody>
        </MainBody>
      </ITEM>
    </ITEM>
  </ITEM>
</ITEM>

Obviously this isn't want I want as the only objects with tags are at the deepest level. 显然,这不是我想要的,因为带有标签的唯一对象位于最深层。 I need it to be as deep as necessary for much larger rules than this - essentially a binary tree with collections of Mainbody, FirstExpression, Operator, and Second Expression. 对于比这更大的规则,我需要它尽可能的深-本质上是一个包含Mainbody,FirstExpression,Operator和Second Expression集合的二叉树。

I also need to place integer values inside tags which is another thing I haven't figure out how to do. 我还需要将整数值放置在标签内,这是我还没有弄清楚该怎么做的另一件事。

I think that pyparsing should be able to do this with groups somehow but I can't figure it out. 我认为pyparsing应该可以以某种方式对组执行此操作,但我无法弄清楚。

Can anyone offer a suggestion on how to achieve this? 谁能提供有关如何实现这一目标的建议?

Thanks 谢谢

EDIT 11/5/15: 编辑11/5/15:

Building off of what Paul wrote I've arrived at this code with an (well intended to be) recursive grammar: 在保罗写的东西的基础上,我已经有了(很可能是)递归语法的代码:

   import pyparsing as pp


operator = pp.oneOf(">= <= != > < =")("operator")
integer = pp.Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")("integer")
parameter = pp.Word(pp.alphas, pp.alphanums + "_" + "." + "-")("parameter")
comparison_term = parameter | integer

firstExpression = pp.Forward()
secondExpression = pp.Forward()

mainbody = pp.Group(firstExpression + operator + secondExpression)("Mainbody")

firstExpression <<  pp.Group(parameter | pp.Optional(mainbody))("FirstExpression")
secondExpression << pp.Group(integer | pp.Optional(mainbody))("SecondExpression")

AND_ = pp.Keyword("AND")("operator")
OR_ = pp.Keyword("OR")("operator")
NOT_ = pp.Keyword("NOT")("operator")

expr = pp.operatorPrecedence(mainbody,[
                            (NOT_, 1, pp.opAssoc.RIGHT, ),
                            (AND_, 2, pp.opAssoc.LEFT, ),
                            (OR_, 2, pp.opAssoc.LEFT, ),
                            ])

# undocumented hack to assign a results name to (expr) - RED FLAG
expr.expr.resultsName = "Mainbody"

expression1 = "((Param1 = 1) \
                OR  (Param2 = 1))"

out = expr.parseString(expression1)[0] # extract item 0 from single-item list
text = out.asXML("Mainbody") # add tag for outermost element
print text

Will infinity recurse. 将无限递归。 Changing the | 更改| to + in the firstExpression and secondExpression lines fixes this but I believe it causes the parser to never look for the mainbody to group. firstExpression和secondExpression行中的+可以解决此问题,但我相信这会使解析器从不寻找要分组的主体。

I've included a simplified rule so I can show the exact output I'm trying to get. 我提供了一条简化的规则,以便可以显示我想要获得的确切输出。

This code generates: 此代码生成:

 <Mainbody>
  <Mainbody>
    <FirstExpression>
      <parameter>Param1</parameter>
    </FirstExpression>
    <operator>=</operator>
    <SecondExpression>
      <integer>1</integer>
    </SecondExpression>
  </Mainbody>
  <operator>OR</operator>
  <Mainbody>
    <FirstExpression>
      <parameter>Param2</parameter>
    </FirstExpression>
    <operator>=</operator>
    <SecondExpression>
      <integer>1</integer>
    </SecondExpression>
  </Mainbody>
</Mainbody>

What I'm trying to get 我想要得到的

  <Mainbody>
    <FirstExpression>
     <Mainbody>
      <FirstExpression>
       <parameter>Param1</parameter>
      </FirstExpression>
      <operator>=</operator>
      <SecondExpression>
       <integer>1</integer>
      </SecondExpression>
     </Mainbody>
    </FirstExpression>
    <operator>OR</operator>
    <SecondExpression> 
     <Mainbody>
      <FirstExpression>
       <parameter>Param2</parameter>
      </FirstExpression>
      <operator>=</operator>
      <SecondExpression>
       <integer>1</integer>
      </SecondExpression>
    </Mainbody>
   </SecondExpression>
  </Mainbody>

It looks the the issue I'm seeing is the parser isn't properly tagging/recognizing/grouping a mainbody as FirstExpression or SecondExpression. 看来我看到的问题是解析器无法正确地将主体标记/识别/分组为FirstExpression或SecondExpression。 I've tried adjusting the grammar and often times get infinite recursion so I have a feeling something is wrong at my grammar definition. 我尝试调整语法,但经常会得到无限递归,因此我觉得语法定义有些错误。 I need this to work for any number of binary grouped (PARAMETER = INTEGER) by AND/OR. 我需要使用它来处理由AND / OR进行的任意数量的二进制分组(PARAMETER = INTEGER)。

Any suggestions? 有什么建议么?

Thanks 谢谢

Here is your code with just a few changes: 这是您的代码,仅作了一些更改:

  • change "AND", "OR", and "NOT" to Keyword expressions, with results name of "operator", so that they will get wrapped in <operator> tags 将结果名称为“ operator”的“ AND”,“ OR”和“ NOT”更改为关键字表达式,以便将它们包装在<operator>标记中
  • hack a results name for the internal expression of the expr created by operatorPrecedence (which is recently renamed to infixNotation ) 修改由operatorPrecedence创建的expr内部表达式的结果名称(该名称最近已重命名为infixNotation
  • extraction of the 0'th element from the single-item list returned from parseString 从parseString返回的单项列表中提取第0个元素
  • add an outermost tag name in the call to asXML 在对asXML的调用中添加最外面的标签名称

.

operator = pp.oneOf(">= <= != > < =")("Operator")
number = pp.Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")("SecondExpression")
identifier = pp.Word(pp.alphas, pp.alphanums + "_" + ".")("FirstExpression")
comparison_term = identifier | number
condition = pp.Group(comparison_term + operator + comparison_term)("MainBody")

# define AND, OR, and NOT as keywords, with "operator" results names
AND_ = pp.Keyword("AND")("operator")
OR_ = pp.Keyword("OR")("operator")
NOT_ = pp.Keyword("NOT")("operator")

expr = pp.operatorPrecedence(condition,[
                            (NOT_, 1, pp.opAssoc.RIGHT, ),
                            (AND_, 2, pp.opAssoc.LEFT, ),
                            (OR_, 2, pp.opAssoc.LEFT, ),
                            ])

# undocumented hack to assign a results name to (expr) - RED FLAG
expr.expr.resultsName = "group"

expression2 =  "((Param1 = 1 AND Param2 = 1 ) \
                OR (Param3 = 1 AND Param4 = 1)) \
                AND \
                (((Param5 = 0 AND Param6 = 1 )  \
                OR(Param7 = 0 AND Param8 = 1)) \
                AND \
                ((Param9 = 0 AND Param10 = 1 )  \
                OR(Param11 = 0 AND Param12 = 1)))"



out = expr.parseString(expression2)[0] # extract item 0 from single-item list
text = out.asXML("expression") # add tag for outermost element
print text

prints: 打印:

<expression>
  <group>
    <group>
      <MainBody>
        <FirstExpression>Param1</FirstExpression>
        <Operator>=</Operator>
        <SecondExpression>1</SecondExpression>
      </MainBody>
      <operator>AND</operator>
      <MainBody>
        <FirstExpression>Param2</FirstExpression>
        <Operator>=</Operator>
        <SecondExpression>1</SecondExpression>
      </MainBody>
    </group>
    <operator>OR</operator>
    <group>
      <MainBody>
        <FirstExpression>Param3</FirstExpression>
        <Operator>=</Operator>
        <SecondExpression>1</SecondExpression>
      </MainBody>
      <operator>AND</operator>
      <MainBody>
        <FirstExpression>Param4</FirstExpression>
        <Operator>=</Operator>
        <SecondExpression>1</SecondExpression>
      </MainBody>
    </group>
  </group>
  <operator>AND</operator>
  <group>
    <group>
      <group>
        <MainBody>
          <FirstExpression>Param5</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>0</SecondExpression>
        </MainBody>
        <operator>AND</operator>
        <MainBody>
          <FirstExpression>Param6</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>1</SecondExpression>
        </MainBody>
      </group>
      <operator>OR</operator>
      <group>
        <MainBody>
          <FirstExpression>Param7</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>0</SecondExpression>
        </MainBody>
        <operator>AND</operator>
        <MainBody>
          <FirstExpression>Param8</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>1</SecondExpression>
        </MainBody>
      </group>
    </group>
    <operator>AND</operator>
    <group>
      <group>
        <MainBody>
          <FirstExpression>Param9</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>0</SecondExpression>
        </MainBody>
        <operator>AND</operator>
        <MainBody>
          <FirstExpression>Param10</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>1</SecondExpression>
        </MainBody>
      </group>
      <operator>OR</operator>
      <group>
        <MainBody>
          <FirstExpression>Param11</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>0</SecondExpression>
        </MainBody>
        <operator>AND</operator>
        <MainBody>
          <FirstExpression>Param12</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>1</SecondExpression>
        </MainBody>
      </group>
    </group>
  </group>
</expression>

So you were definitely on the right track, as far as this goes, but I think the fact that we have to hack a results name into an internal undocumented member variable of expr is something of a red flag, and that more than likely, you will soon reach the limit of what you can do with operatorPrecedence . 因此,就目前而言,您绝对处于正确的轨道上,但是我认为,我们必须将结果名称修改为内部未记录的expr成员变量,这确实是一个危险信号,事实很可能是,您很快将达到您可以使用operatorPrecedence的极限。

You will probably have to implement your own recursive parser to full control over how all the elements and sub-elements get named. 您可能必须实现自己的递归解析器,才能完全控制所有元素和子元素的命名方式。 You may even need to implement your own version of asXML() to control whether or not you get intermediate levels, such as the <group> tags shown above. 您甚至可能需要实现自己的asXML()版本来控制是否获得中间级别,例如上面显示的<group>标记。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM