[英]Pyparsing/Python Binary Boolean Expression to XML Nesting Issue (2.7.10)
I need to parse nested binary boolean expressions into an XML tree. 我需要将嵌套的二进制布尔表达式解析为XML树。 For example take the expression 例如以表达式
expression2 = "((Param1 = 1 AND Param2 = 1 ) \
OR (Param3 = 1 AND Param4 = 1)) \
AND \
(((Param5 = 0 AND Param6 = 1 ) \
OR(Param7 = 0 AND Param8 = 1)) \
AND \
((Param9 = 0 AND Param10 = 1 ) \
OR(Param11 = 0 AND Param12 = 1)))"
which is essentially a combination of (Expression) (Operator) (Expression)
terms. 本质上是(Expression) (Operator) (Expression)
项的组合。
I need the output to be a combination of these expressions with proper tags in XML. 我需要输出是这些表达式与XML中适当标签的组合。 aka 又名
<MainBody>
<FirstExpression>
Parameter
</FirstExpression>
<Operator>=</Operator>
<SecondExpression>
1
</SecondExpression>
</MainBody>
where firstexpression can be a parameter or a mainbody (here is the nesting), operator is always =, <, >, AND, OR, and secondexpression is either an integer or a mainbody 其中firstexpression可以是参数或主体(这里是嵌套),operator始终为=,<,>,AND,OR,并且secondexpression是整数或主体
There will always be groups of three - aka the smallest discrete object will consist of the firstexpression the operator and the second expression. 总会有三组-aka最小的离散对象将由运算符的第一个表达式和第二个表达式组成。
The code I've come up with (This is my first time using python) gets me somewhat there. 我提出的代码(这是我第一次使用python)使我有所了解。
import pyparsing as pp
import xml.etree.ElementTree as ET
operator = pp.Regex(">=|<=|!=|>|<|=").setName("operator").setResultsName("Operator")
number = pp.Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?").setResultsName("SecondExpression")
identifier = pp.Word(pp.alphas, pp.alphanums + "_" + ".").setName("FirstExpression").setResultsName("FirstExpression")
comparison_term = identifier | number
condition = pp.Group(comparison_term + operator + comparison_term).setResultsName("MainBody")
expr = pp.operatorPrecedence(condition,[
("NOT", 1, pp.opAssoc.RIGHT, ),
("AND", 2, pp.opAssoc.LEFT, ),
("OR", 2, pp.opAssoc.LEFT, ),
])
expression2 = "((Param1 = 1 AND Param2 = 1 ) \
OR (Param3 = 1 AND Param4 = 1)) \
AND \
(((Param5 = 0 AND Param6 = 1 ) \
OR(Param7 = 0 AND Param8 = 1)) \
AND \
((Param9 = 0 AND Param10 = 1 ) \
OR(Param11 = 0 AND Param12 = 1)))"
out = expr.parseString(expression2)
text = out.asXML()
f = open('rules.xml','w+')
f.write(text)
f.close()
root = ET.parse("rules.xml").getroot()
print ET.tostring(root)
This outputs XML of this form: 这将输出这种形式的XML:
<ITEM>
<ITEM>
<ITEM>
<MainBody>
<MainBody>
<FirstExpression>Param1</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
<ITEM>AND</ITEM>
<MainBody>
<FirstExpression>Param2</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</MainBody>
<ITEM>OR</ITEM>
<MainBody>
<MainBody>
<FirstExpression>Param3</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
<ITEM>AND</ITEM>
<MainBody>
<FirstExpression>Param4</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</MainBody>
</ITEM>
<ITEM>AND</ITEM>
<ITEM>
<ITEM>
<MainBody>
<MainBody>
<FirstExpression>Param5</FirstExpression>
<Operator>=</Operator>
<SecondExpression>0</SecondExpression>
</MainBody>
<ITEM>AND</ITEM>
<MainBody>
<FirstExpression>Param6</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</MainBody>
<ITEM>OR</ITEM>
<MainBody>
<MainBody>
<FirstExpression>Param7</FirstExpression>
<Operator>=</Operator>
<SecondExpression>0</SecondExpression>
</MainBody>
<ITEM>AND</ITEM>
<MainBody>
<FirstExpression>Param8</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</MainBody>
</ITEM>
<ITEM>AND</ITEM>
<ITEM>
<MainBody>
<MainBody>
<FirstExpression>Param9</FirstExpression>
<Operator>=</Operator>
<SecondExpression>0</SecondExpression>
</MainBody>
<ITEM>AND</ITEM>
<MainBody>
<FirstExpression>Param10</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</MainBody>
<ITEM>OR</ITEM>
<MainBody>
<MainBody>
<FirstExpression>Param11</FirstExpression>
<Operator>=</Operator>
<SecondExpression>0</SecondExpression>
</MainBody>
<ITEM>AND</ITEM>
<MainBody>
<FirstExpression>Param12</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</MainBody>
</ITEM>
</ITEM>
</ITEM>
</ITEM>
Obviously this isn't want I want as the only objects with tags are at the deepest level. 显然,这不是我想要的,因为带有标签的唯一对象位于最深层。 I need it to be as deep as necessary for much larger rules than this - essentially a binary tree with collections of Mainbody, FirstExpression, Operator, and Second Expression. 对于比这更大的规则,我需要它尽可能的深-本质上是一个包含Mainbody,FirstExpression,Operator和Second Expression集合的二叉树。
I also need to place integer values inside tags which is another thing I haven't figure out how to do. 我还需要将整数值放置在标签内,这是我还没有弄清楚该怎么做的另一件事。
I think that pyparsing should be able to do this with groups somehow but I can't figure it out. 我认为pyparsing应该可以以某种方式对组执行此操作,但我无法弄清楚。
Can anyone offer a suggestion on how to achieve this? 谁能提供有关如何实现这一目标的建议?
Thanks 谢谢
EDIT 11/5/15: 编辑11/5/15:
Building off of what Paul wrote I've arrived at this code with an (well intended to be) recursive grammar: 在保罗写的东西的基础上,我已经有了(很可能是)递归语法的代码:
import pyparsing as pp
operator = pp.oneOf(">= <= != > < =")("operator")
integer = pp.Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")("integer")
parameter = pp.Word(pp.alphas, pp.alphanums + "_" + "." + "-")("parameter")
comparison_term = parameter | integer
firstExpression = pp.Forward()
secondExpression = pp.Forward()
mainbody = pp.Group(firstExpression + operator + secondExpression)("Mainbody")
firstExpression << pp.Group(parameter | pp.Optional(mainbody))("FirstExpression")
secondExpression << pp.Group(integer | pp.Optional(mainbody))("SecondExpression")
AND_ = pp.Keyword("AND")("operator")
OR_ = pp.Keyword("OR")("operator")
NOT_ = pp.Keyword("NOT")("operator")
expr = pp.operatorPrecedence(mainbody,[
(NOT_, 1, pp.opAssoc.RIGHT, ),
(AND_, 2, pp.opAssoc.LEFT, ),
(OR_, 2, pp.opAssoc.LEFT, ),
])
# undocumented hack to assign a results name to (expr) - RED FLAG
expr.expr.resultsName = "Mainbody"
expression1 = "((Param1 = 1) \
OR (Param2 = 1))"
out = expr.parseString(expression1)[0] # extract item 0 from single-item list
text = out.asXML("Mainbody") # add tag for outermost element
print text
Will infinity recurse. 将无限递归。 Changing the | 更改| to + in the firstExpression and secondExpression lines fixes this but I believe it causes the parser to never look for the mainbody to group. firstExpression和secondExpression行中的+可以解决此问题,但我相信这会使解析器从不寻找要分组的主体。
I've included a simplified rule so I can show the exact output I'm trying to get. 我提供了一条简化的规则,以便可以显示我想要获得的确切输出。
This code generates: 此代码生成:
<Mainbody>
<Mainbody>
<FirstExpression>
<parameter>Param1</parameter>
</FirstExpression>
<operator>=</operator>
<SecondExpression>
<integer>1</integer>
</SecondExpression>
</Mainbody>
<operator>OR</operator>
<Mainbody>
<FirstExpression>
<parameter>Param2</parameter>
</FirstExpression>
<operator>=</operator>
<SecondExpression>
<integer>1</integer>
</SecondExpression>
</Mainbody>
</Mainbody>
What I'm trying to get 我想要得到的
<Mainbody>
<FirstExpression>
<Mainbody>
<FirstExpression>
<parameter>Param1</parameter>
</FirstExpression>
<operator>=</operator>
<SecondExpression>
<integer>1</integer>
</SecondExpression>
</Mainbody>
</FirstExpression>
<operator>OR</operator>
<SecondExpression>
<Mainbody>
<FirstExpression>
<parameter>Param2</parameter>
</FirstExpression>
<operator>=</operator>
<SecondExpression>
<integer>1</integer>
</SecondExpression>
</Mainbody>
</SecondExpression>
</Mainbody>
It looks the the issue I'm seeing is the parser isn't properly tagging/recognizing/grouping a mainbody as FirstExpression or SecondExpression. 看来我看到的问题是解析器无法正确地将主体标记/识别/分组为FirstExpression或SecondExpression。 I've tried adjusting the grammar and often times get infinite recursion so I have a feeling something is wrong at my grammar definition. 我尝试调整语法,但经常会得到无限递归,因此我觉得语法定义有些错误。 I need this to work for any number of binary grouped (PARAMETER = INTEGER) by AND/OR. 我需要使用它来处理由AND / OR进行的任意数量的二进制分组(PARAMETER = INTEGER)。
Any suggestions? 有什么建议么?
Thanks 谢谢
Here is your code with just a few changes: 这是您的代码,仅作了一些更改:
<operator>
tags 将结果名称为“ operator”的“ AND”,“ OR”和“ NOT”更改为关键字表达式,以便将它们包装在<operator>
标记中 expr
created by operatorPrecedence
(which is recently renamed to infixNotation
) 修改由operatorPrecedence
创建的expr
内部表达式的结果名称(该名称最近已重命名为infixNotation
) . 。
operator = pp.oneOf(">= <= != > < =")("Operator")
number = pp.Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")("SecondExpression")
identifier = pp.Word(pp.alphas, pp.alphanums + "_" + ".")("FirstExpression")
comparison_term = identifier | number
condition = pp.Group(comparison_term + operator + comparison_term)("MainBody")
# define AND, OR, and NOT as keywords, with "operator" results names
AND_ = pp.Keyword("AND")("operator")
OR_ = pp.Keyword("OR")("operator")
NOT_ = pp.Keyword("NOT")("operator")
expr = pp.operatorPrecedence(condition,[
(NOT_, 1, pp.opAssoc.RIGHT, ),
(AND_, 2, pp.opAssoc.LEFT, ),
(OR_, 2, pp.opAssoc.LEFT, ),
])
# undocumented hack to assign a results name to (expr) - RED FLAG
expr.expr.resultsName = "group"
expression2 = "((Param1 = 1 AND Param2 = 1 ) \
OR (Param3 = 1 AND Param4 = 1)) \
AND \
(((Param5 = 0 AND Param6 = 1 ) \
OR(Param7 = 0 AND Param8 = 1)) \
AND \
((Param9 = 0 AND Param10 = 1 ) \
OR(Param11 = 0 AND Param12 = 1)))"
out = expr.parseString(expression2)[0] # extract item 0 from single-item list
text = out.asXML("expression") # add tag for outermost element
print text
prints: 打印:
<expression>
<group>
<group>
<MainBody>
<FirstExpression>Param1</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
<operator>AND</operator>
<MainBody>
<FirstExpression>Param2</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</group>
<operator>OR</operator>
<group>
<MainBody>
<FirstExpression>Param3</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
<operator>AND</operator>
<MainBody>
<FirstExpression>Param4</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</group>
</group>
<operator>AND</operator>
<group>
<group>
<group>
<MainBody>
<FirstExpression>Param5</FirstExpression>
<Operator>=</Operator>
<SecondExpression>0</SecondExpression>
</MainBody>
<operator>AND</operator>
<MainBody>
<FirstExpression>Param6</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</group>
<operator>OR</operator>
<group>
<MainBody>
<FirstExpression>Param7</FirstExpression>
<Operator>=</Operator>
<SecondExpression>0</SecondExpression>
</MainBody>
<operator>AND</operator>
<MainBody>
<FirstExpression>Param8</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</group>
</group>
<operator>AND</operator>
<group>
<group>
<MainBody>
<FirstExpression>Param9</FirstExpression>
<Operator>=</Operator>
<SecondExpression>0</SecondExpression>
</MainBody>
<operator>AND</operator>
<MainBody>
<FirstExpression>Param10</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</group>
<operator>OR</operator>
<group>
<MainBody>
<FirstExpression>Param11</FirstExpression>
<Operator>=</Operator>
<SecondExpression>0</SecondExpression>
</MainBody>
<operator>AND</operator>
<MainBody>
<FirstExpression>Param12</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</group>
</group>
</group>
</expression>
So you were definitely on the right track, as far as this goes, but I think the fact that we have to hack a results name into an internal undocumented member variable of expr
is something of a red flag, and that more than likely, you will soon reach the limit of what you can do with operatorPrecedence
. 因此,就目前而言,您绝对处于正确的轨道上,但是我认为,我们必须将结果名称修改为内部未记录的expr
成员变量,这确实是一个危险信号,事实很可能是,您很快将达到您可以使用operatorPrecedence
的极限。
You will probably have to implement your own recursive parser to full control over how all the elements and sub-elements get named. 您可能必须实现自己的递归解析器,才能完全控制所有元素和子元素的命名方式。 You may even need to implement your own version of asXML()
to control whether or not you get intermediate levels, such as the <group>
tags shown above. 您甚至可能需要实现自己的asXML()
版本来控制是否获得中间级别,例如上面显示的<group>
标记。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.