[英]Pyparsing/Python Binary Boolean Expression to XML Nesting Issue (2.7.10)
我需要將嵌套的二進制布爾表達式解析為XML樹。 例如以表達式
expression2 = "((Param1 = 1 AND Param2 = 1 ) \
OR (Param3 = 1 AND Param4 = 1)) \
AND \
(((Param5 = 0 AND Param6 = 1 ) \
OR(Param7 = 0 AND Param8 = 1)) \
AND \
((Param9 = 0 AND Param10 = 1 ) \
OR(Param11 = 0 AND Param12 = 1)))"
本質上是(Expression) (Operator) (Expression)
項的組合。
我需要輸出是這些表達式與XML中適當標簽的組合。 又名
<MainBody>
<FirstExpression>
Parameter
</FirstExpression>
<Operator>=</Operator>
<SecondExpression>
1
</SecondExpression>
</MainBody>
其中firstexpression可以是參數或主體(這里是嵌套),operator始終為=,<,>,AND,OR,並且secondexpression是整數或主體
總會有三組-aka最小的離散對象將由運算符的第一個表達式和第二個表達式組成。
我提出的代碼(這是我第一次使用python)使我有所了解。
import pyparsing as pp
import xml.etree.ElementTree as ET
operator = pp.Regex(">=|<=|!=|>|<|=").setName("operator").setResultsName("Operator")
number = pp.Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?").setResultsName("SecondExpression")
identifier = pp.Word(pp.alphas, pp.alphanums + "_" + ".").setName("FirstExpression").setResultsName("FirstExpression")
comparison_term = identifier | number
condition = pp.Group(comparison_term + operator + comparison_term).setResultsName("MainBody")
expr = pp.operatorPrecedence(condition,[
("NOT", 1, pp.opAssoc.RIGHT, ),
("AND", 2, pp.opAssoc.LEFT, ),
("OR", 2, pp.opAssoc.LEFT, ),
])
expression2 = "((Param1 = 1 AND Param2 = 1 ) \
OR (Param3 = 1 AND Param4 = 1)) \
AND \
(((Param5 = 0 AND Param6 = 1 ) \
OR(Param7 = 0 AND Param8 = 1)) \
AND \
((Param9 = 0 AND Param10 = 1 ) \
OR(Param11 = 0 AND Param12 = 1)))"
out = expr.parseString(expression2)
text = out.asXML()
f = open('rules.xml','w+')
f.write(text)
f.close()
root = ET.parse("rules.xml").getroot()
print ET.tostring(root)
這將輸出這種形式的XML:
<ITEM>
<ITEM>
<ITEM>
<MainBody>
<MainBody>
<FirstExpression>Param1</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
<ITEM>AND</ITEM>
<MainBody>
<FirstExpression>Param2</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</MainBody>
<ITEM>OR</ITEM>
<MainBody>
<MainBody>
<FirstExpression>Param3</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
<ITEM>AND</ITEM>
<MainBody>
<FirstExpression>Param4</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</MainBody>
</ITEM>
<ITEM>AND</ITEM>
<ITEM>
<ITEM>
<MainBody>
<MainBody>
<FirstExpression>Param5</FirstExpression>
<Operator>=</Operator>
<SecondExpression>0</SecondExpression>
</MainBody>
<ITEM>AND</ITEM>
<MainBody>
<FirstExpression>Param6</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</MainBody>
<ITEM>OR</ITEM>
<MainBody>
<MainBody>
<FirstExpression>Param7</FirstExpression>
<Operator>=</Operator>
<SecondExpression>0</SecondExpression>
</MainBody>
<ITEM>AND</ITEM>
<MainBody>
<FirstExpression>Param8</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</MainBody>
</ITEM>
<ITEM>AND</ITEM>
<ITEM>
<MainBody>
<MainBody>
<FirstExpression>Param9</FirstExpression>
<Operator>=</Operator>
<SecondExpression>0</SecondExpression>
</MainBody>
<ITEM>AND</ITEM>
<MainBody>
<FirstExpression>Param10</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</MainBody>
<ITEM>OR</ITEM>
<MainBody>
<MainBody>
<FirstExpression>Param11</FirstExpression>
<Operator>=</Operator>
<SecondExpression>0</SecondExpression>
</MainBody>
<ITEM>AND</ITEM>
<MainBody>
<FirstExpression>Param12</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</MainBody>
</ITEM>
</ITEM>
</ITEM>
</ITEM>
顯然,這不是我想要的,因為帶有標簽的唯一對象位於最深層。 對於比這更大的規則,我需要它盡可能的深-本質上是一個包含Mainbody,FirstExpression,Operator和Second Expression集合的二叉樹。
我還需要將整數值放置在標簽內,這是我還沒有弄清楚該怎么做的另一件事。
我認為pyparsing應該可以以某種方式對組執行此操作,但我無法弄清楚。
誰能提供有關如何實現這一目標的建議?
謝謝
編輯11/5/15:
在保羅寫的東西的基礎上,我已經有了(很可能是)遞歸語法的代碼:
import pyparsing as pp
operator = pp.oneOf(">= <= != > < =")("operator")
integer = pp.Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")("integer")
parameter = pp.Word(pp.alphas, pp.alphanums + "_" + "." + "-")("parameter")
comparison_term = parameter | integer
firstExpression = pp.Forward()
secondExpression = pp.Forward()
mainbody = pp.Group(firstExpression + operator + secondExpression)("Mainbody")
firstExpression << pp.Group(parameter | pp.Optional(mainbody))("FirstExpression")
secondExpression << pp.Group(integer | pp.Optional(mainbody))("SecondExpression")
AND_ = pp.Keyword("AND")("operator")
OR_ = pp.Keyword("OR")("operator")
NOT_ = pp.Keyword("NOT")("operator")
expr = pp.operatorPrecedence(mainbody,[
(NOT_, 1, pp.opAssoc.RIGHT, ),
(AND_, 2, pp.opAssoc.LEFT, ),
(OR_, 2, pp.opAssoc.LEFT, ),
])
# undocumented hack to assign a results name to (expr) - RED FLAG
expr.expr.resultsName = "Mainbody"
expression1 = "((Param1 = 1) \
OR (Param2 = 1))"
out = expr.parseString(expression1)[0] # extract item 0 from single-item list
text = out.asXML("Mainbody") # add tag for outermost element
print text
將無限遞歸。 更改| firstExpression和secondExpression行中的+可以解決此問題,但我相信這會使解析器從不尋找要分組的主體。
我提供了一條簡化的規則,以便可以顯示我想要獲得的確切輸出。
此代碼生成:
<Mainbody>
<Mainbody>
<FirstExpression>
<parameter>Param1</parameter>
</FirstExpression>
<operator>=</operator>
<SecondExpression>
<integer>1</integer>
</SecondExpression>
</Mainbody>
<operator>OR</operator>
<Mainbody>
<FirstExpression>
<parameter>Param2</parameter>
</FirstExpression>
<operator>=</operator>
<SecondExpression>
<integer>1</integer>
</SecondExpression>
</Mainbody>
</Mainbody>
我想要得到的
<Mainbody>
<FirstExpression>
<Mainbody>
<FirstExpression>
<parameter>Param1</parameter>
</FirstExpression>
<operator>=</operator>
<SecondExpression>
<integer>1</integer>
</SecondExpression>
</Mainbody>
</FirstExpression>
<operator>OR</operator>
<SecondExpression>
<Mainbody>
<FirstExpression>
<parameter>Param2</parameter>
</FirstExpression>
<operator>=</operator>
<SecondExpression>
<integer>1</integer>
</SecondExpression>
</Mainbody>
</SecondExpression>
</Mainbody>
看來我看到的問題是解析器無法正確地將主體標記/識別/分組為FirstExpression或SecondExpression。 我嘗試調整語法,但經常會得到無限遞歸,因此我覺得語法定義有些錯誤。 我需要使用它來處理由AND / OR進行的任意數量的二進制分組(PARAMETER = INTEGER)。
有什么建議么?
謝謝
這是您的代碼,僅作了一些更改:
<operator>
標記中 operatorPrecedence
創建的expr
內部表達式的結果名稱(該名稱最近已重命名為infixNotation
) 。
operator = pp.oneOf(">= <= != > < =")("Operator")
number = pp.Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")("SecondExpression")
identifier = pp.Word(pp.alphas, pp.alphanums + "_" + ".")("FirstExpression")
comparison_term = identifier | number
condition = pp.Group(comparison_term + operator + comparison_term)("MainBody")
# define AND, OR, and NOT as keywords, with "operator" results names
AND_ = pp.Keyword("AND")("operator")
OR_ = pp.Keyword("OR")("operator")
NOT_ = pp.Keyword("NOT")("operator")
expr = pp.operatorPrecedence(condition,[
(NOT_, 1, pp.opAssoc.RIGHT, ),
(AND_, 2, pp.opAssoc.LEFT, ),
(OR_, 2, pp.opAssoc.LEFT, ),
])
# undocumented hack to assign a results name to (expr) - RED FLAG
expr.expr.resultsName = "group"
expression2 = "((Param1 = 1 AND Param2 = 1 ) \
OR (Param3 = 1 AND Param4 = 1)) \
AND \
(((Param5 = 0 AND Param6 = 1 ) \
OR(Param7 = 0 AND Param8 = 1)) \
AND \
((Param9 = 0 AND Param10 = 1 ) \
OR(Param11 = 0 AND Param12 = 1)))"
out = expr.parseString(expression2)[0] # extract item 0 from single-item list
text = out.asXML("expression") # add tag for outermost element
print text
打印:
<expression>
<group>
<group>
<MainBody>
<FirstExpression>Param1</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
<operator>AND</operator>
<MainBody>
<FirstExpression>Param2</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</group>
<operator>OR</operator>
<group>
<MainBody>
<FirstExpression>Param3</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
<operator>AND</operator>
<MainBody>
<FirstExpression>Param4</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</group>
</group>
<operator>AND</operator>
<group>
<group>
<group>
<MainBody>
<FirstExpression>Param5</FirstExpression>
<Operator>=</Operator>
<SecondExpression>0</SecondExpression>
</MainBody>
<operator>AND</operator>
<MainBody>
<FirstExpression>Param6</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</group>
<operator>OR</operator>
<group>
<MainBody>
<FirstExpression>Param7</FirstExpression>
<Operator>=</Operator>
<SecondExpression>0</SecondExpression>
</MainBody>
<operator>AND</operator>
<MainBody>
<FirstExpression>Param8</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</group>
</group>
<operator>AND</operator>
<group>
<group>
<MainBody>
<FirstExpression>Param9</FirstExpression>
<Operator>=</Operator>
<SecondExpression>0</SecondExpression>
</MainBody>
<operator>AND</operator>
<MainBody>
<FirstExpression>Param10</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</group>
<operator>OR</operator>
<group>
<MainBody>
<FirstExpression>Param11</FirstExpression>
<Operator>=</Operator>
<SecondExpression>0</SecondExpression>
</MainBody>
<operator>AND</operator>
<MainBody>
<FirstExpression>Param12</FirstExpression>
<Operator>=</Operator>
<SecondExpression>1</SecondExpression>
</MainBody>
</group>
</group>
</group>
</expression>
因此,就目前而言,您絕對處於正確的軌道上,但是我認為,我們必須將結果名稱修改為內部未記錄的expr
成員變量,這確實是一個危險信號,事實很可能是,您很快將達到您可以使用operatorPrecedence
的極限。
您可能必須實現自己的遞歸解析器,才能完全控制所有元素和子元素的命名方式。 您甚至可能需要實現自己的asXML()
版本來控制是否獲得中間級別,例如上面顯示的<group>
標記。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.