簡體   English   中英

Pyparsing / Python二進制布爾表達式到XML嵌套問題(2.7.10)

[英]Pyparsing/Python Binary Boolean Expression to XML Nesting Issue (2.7.10)

我需要將嵌套的二進制布爾表達式解析為XML樹。 例如以表達式

expression2 =  "((Param1 = 1 AND Param2 = 1 ) \
            OR (Param3 = 1 AND Param4 = 1)) \
            AND \
            (((Param5 = 0 AND Param6 = 1 )  \
            OR(Param7 = 0 AND Param8 = 1)) \
            AND \
            ((Param9 = 0 AND Param10 = 1 )  \
            OR(Param11 = 0 AND Param12 = 1)))"

本質上是(Expression) (Operator) (Expression)項的組合。

我需要輸出是這些表達式與XML中適當標簽的組合。 又名

<MainBody>
          <FirstExpression>
            Parameter
          </FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>
            1
          </SecondExpression>
        </MainBody>

其中firstexpression可以是參數或主體(這里是嵌套),operator始終為=,<,>,AND,OR,並且secondexpression是整數或主體

總會有三組-aka最小的離散對象將由運算符的第一個表達式和第二個表達式組成。

我提出的代碼(這是我第一次使用python)使我有所了解。

import pyparsing as pp
import xml.etree.ElementTree as ET


operator = pp.Regex(">=|<=|!=|>|<|=").setName("operator").setResultsName("Operator")
number = pp.Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?").setResultsName("SecondExpression")
identifier = pp.Word(pp.alphas, pp.alphanums + "_" + ".").setName("FirstExpression").setResultsName("FirstExpression")
comparison_term = identifier | number
condition = pp.Group(comparison_term + operator + comparison_term).setResultsName("MainBody")


expr = pp.operatorPrecedence(condition,[
                            ("NOT", 1, pp.opAssoc.RIGHT, ),
                            ("AND", 2, pp.opAssoc.LEFT, ),
                            ("OR", 2, pp.opAssoc.LEFT, ),
                            ])


expression2 =  "((Param1 = 1 AND Param2 = 1 ) \
                OR (Param3 = 1 AND Param4 = 1)) \
                AND \
                (((Param5 = 0 AND Param6 = 1 )  \
                OR(Param7 = 0 AND Param8 = 1)) \
                AND \
                ((Param9 = 0 AND Param10 = 1 )  \
                OR(Param11 = 0 AND Param12 = 1)))"



out = expr.parseString(expression2)
text = out.asXML()

f = open('rules.xml','w+')
f.write(text) 
f.close()

root = ET.parse("rules.xml").getroot()

print ET.tostring(root)

這將輸出這種形式的XML:

<ITEM>
  <ITEM>
    <ITEM>
      <MainBody>
        <MainBody>
          <FirstExpression>Param1</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>1</SecondExpression>
        </MainBody>
        <ITEM>AND</ITEM>
        <MainBody>
          <FirstExpression>Param2</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>1</SecondExpression>
        </MainBody>
      </MainBody>
      <ITEM>OR</ITEM>
      <MainBody>
        <MainBody>
          <FirstExpression>Param3</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>1</SecondExpression>
        </MainBody>
        <ITEM>AND</ITEM>
        <MainBody>
          <FirstExpression>Param4</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>1</SecondExpression>
        </MainBody>
      </MainBody>
    </ITEM>
    <ITEM>AND</ITEM>
    <ITEM>
      <ITEM>
        <MainBody>
          <MainBody>
            <FirstExpression>Param5</FirstExpression>
            <Operator>=</Operator>
            <SecondExpression>0</SecondExpression>
          </MainBody>
          <ITEM>AND</ITEM>
          <MainBody>
            <FirstExpression>Param6</FirstExpression>
            <Operator>=</Operator>
            <SecondExpression>1</SecondExpression>
          </MainBody>
        </MainBody>
        <ITEM>OR</ITEM>
        <MainBody>
          <MainBody>
            <FirstExpression>Param7</FirstExpression>
            <Operator>=</Operator>
            <SecondExpression>0</SecondExpression>
          </MainBody>
          <ITEM>AND</ITEM>
          <MainBody>
            <FirstExpression>Param8</FirstExpression>
            <Operator>=</Operator>
            <SecondExpression>1</SecondExpression>
          </MainBody>
        </MainBody>
      </ITEM>
      <ITEM>AND</ITEM>
      <ITEM>
        <MainBody>
          <MainBody>
            <FirstExpression>Param9</FirstExpression>
            <Operator>=</Operator>
            <SecondExpression>0</SecondExpression>
          </MainBody>
          <ITEM>AND</ITEM>
          <MainBody>
            <FirstExpression>Param10</FirstExpression>
            <Operator>=</Operator>
            <SecondExpression>1</SecondExpression>
          </MainBody>
        </MainBody>
        <ITEM>OR</ITEM>
        <MainBody>
          <MainBody>
            <FirstExpression>Param11</FirstExpression>
            <Operator>=</Operator>
            <SecondExpression>0</SecondExpression>
          </MainBody>
          <ITEM>AND</ITEM>
          <MainBody>
            <FirstExpression>Param12</FirstExpression>
            <Operator>=</Operator>
            <SecondExpression>1</SecondExpression>
          </MainBody>
        </MainBody>
      </ITEM>
    </ITEM>
  </ITEM>
</ITEM>

顯然,這不是我想要的,因為帶有標簽的唯一對象位於最深層。 對於比這更大的規則,我需要它盡可能的深-本質上是一個包含Mainbody,FirstExpression,Operator和Second Expression集合的二叉樹。

我還需要將整數值放置在標簽內,這是我還沒有弄清楚該怎么做的另一件事。

我認為pyparsing應該可以以某種方式對組執行此操作,但我無法弄清楚。

誰能提供有關如何實現這一目標的建議?

謝謝

編輯11/5/15:

在保羅寫的東西的基礎上,我已經有了(很可能是)遞歸語法的代碼:

   import pyparsing as pp


operator = pp.oneOf(">= <= != > < =")("operator")
integer = pp.Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")("integer")
parameter = pp.Word(pp.alphas, pp.alphanums + "_" + "." + "-")("parameter")
comparison_term = parameter | integer

firstExpression = pp.Forward()
secondExpression = pp.Forward()

mainbody = pp.Group(firstExpression + operator + secondExpression)("Mainbody")

firstExpression <<  pp.Group(parameter | pp.Optional(mainbody))("FirstExpression")
secondExpression << pp.Group(integer | pp.Optional(mainbody))("SecondExpression")

AND_ = pp.Keyword("AND")("operator")
OR_ = pp.Keyword("OR")("operator")
NOT_ = pp.Keyword("NOT")("operator")

expr = pp.operatorPrecedence(mainbody,[
                            (NOT_, 1, pp.opAssoc.RIGHT, ),
                            (AND_, 2, pp.opAssoc.LEFT, ),
                            (OR_, 2, pp.opAssoc.LEFT, ),
                            ])

# undocumented hack to assign a results name to (expr) - RED FLAG
expr.expr.resultsName = "Mainbody"

expression1 = "((Param1 = 1) \
                OR  (Param2 = 1))"

out = expr.parseString(expression1)[0] # extract item 0 from single-item list
text = out.asXML("Mainbody") # add tag for outermost element
print text

將無限遞歸。 更改| firstExpression和secondExpression行中的+可以解決此問題,但我相信這會使解析器從不尋找要分組的主體。

我提供了一條簡化的規則,以便可以顯示我想要獲得的確切輸出。

此代碼生成:

 <Mainbody>
  <Mainbody>
    <FirstExpression>
      <parameter>Param1</parameter>
    </FirstExpression>
    <operator>=</operator>
    <SecondExpression>
      <integer>1</integer>
    </SecondExpression>
  </Mainbody>
  <operator>OR</operator>
  <Mainbody>
    <FirstExpression>
      <parameter>Param2</parameter>
    </FirstExpression>
    <operator>=</operator>
    <SecondExpression>
      <integer>1</integer>
    </SecondExpression>
  </Mainbody>
</Mainbody>

我想要得到的

  <Mainbody>
    <FirstExpression>
     <Mainbody>
      <FirstExpression>
       <parameter>Param1</parameter>
      </FirstExpression>
      <operator>=</operator>
      <SecondExpression>
       <integer>1</integer>
      </SecondExpression>
     </Mainbody>
    </FirstExpression>
    <operator>OR</operator>
    <SecondExpression> 
     <Mainbody>
      <FirstExpression>
       <parameter>Param2</parameter>
      </FirstExpression>
      <operator>=</operator>
      <SecondExpression>
       <integer>1</integer>
      </SecondExpression>
    </Mainbody>
   </SecondExpression>
  </Mainbody>

看來我看到的問題是解析器無法正確地將主體標記/識別/分組為FirstExpression或SecondExpression。 我嘗試調整語法,但經常會得到無限遞歸,因此我覺得語法定義有些錯誤。 我需要使用它來處理由AND / OR進行的任意數量的二進制分組(PARAMETER = INTEGER)。

有什么建議么?

謝謝

這是您的代碼,僅作了一些更改:

  • 將結果名稱為“ operator”的“ AND”,“ OR”和“ NOT”更改為關鍵字表達式,以便將它們包裝在<operator>標記中
  • 修改由operatorPrecedence創建的expr內部表達式的結果名稱(該名稱最近已重命名為infixNotation
  • 從parseString返回的單項列表中提取第0個元素
  • 在對asXML的調用中添加最外面的標簽名稱

operator = pp.oneOf(">= <= != > < =")("Operator")
number = pp.Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")("SecondExpression")
identifier = pp.Word(pp.alphas, pp.alphanums + "_" + ".")("FirstExpression")
comparison_term = identifier | number
condition = pp.Group(comparison_term + operator + comparison_term)("MainBody")

# define AND, OR, and NOT as keywords, with "operator" results names
AND_ = pp.Keyword("AND")("operator")
OR_ = pp.Keyword("OR")("operator")
NOT_ = pp.Keyword("NOT")("operator")

expr = pp.operatorPrecedence(condition,[
                            (NOT_, 1, pp.opAssoc.RIGHT, ),
                            (AND_, 2, pp.opAssoc.LEFT, ),
                            (OR_, 2, pp.opAssoc.LEFT, ),
                            ])

# undocumented hack to assign a results name to (expr) - RED FLAG
expr.expr.resultsName = "group"

expression2 =  "((Param1 = 1 AND Param2 = 1 ) \
                OR (Param3 = 1 AND Param4 = 1)) \
                AND \
                (((Param5 = 0 AND Param6 = 1 )  \
                OR(Param7 = 0 AND Param8 = 1)) \
                AND \
                ((Param9 = 0 AND Param10 = 1 )  \
                OR(Param11 = 0 AND Param12 = 1)))"



out = expr.parseString(expression2)[0] # extract item 0 from single-item list
text = out.asXML("expression") # add tag for outermost element
print text

打印:

<expression>
  <group>
    <group>
      <MainBody>
        <FirstExpression>Param1</FirstExpression>
        <Operator>=</Operator>
        <SecondExpression>1</SecondExpression>
      </MainBody>
      <operator>AND</operator>
      <MainBody>
        <FirstExpression>Param2</FirstExpression>
        <Operator>=</Operator>
        <SecondExpression>1</SecondExpression>
      </MainBody>
    </group>
    <operator>OR</operator>
    <group>
      <MainBody>
        <FirstExpression>Param3</FirstExpression>
        <Operator>=</Operator>
        <SecondExpression>1</SecondExpression>
      </MainBody>
      <operator>AND</operator>
      <MainBody>
        <FirstExpression>Param4</FirstExpression>
        <Operator>=</Operator>
        <SecondExpression>1</SecondExpression>
      </MainBody>
    </group>
  </group>
  <operator>AND</operator>
  <group>
    <group>
      <group>
        <MainBody>
          <FirstExpression>Param5</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>0</SecondExpression>
        </MainBody>
        <operator>AND</operator>
        <MainBody>
          <FirstExpression>Param6</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>1</SecondExpression>
        </MainBody>
      </group>
      <operator>OR</operator>
      <group>
        <MainBody>
          <FirstExpression>Param7</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>0</SecondExpression>
        </MainBody>
        <operator>AND</operator>
        <MainBody>
          <FirstExpression>Param8</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>1</SecondExpression>
        </MainBody>
      </group>
    </group>
    <operator>AND</operator>
    <group>
      <group>
        <MainBody>
          <FirstExpression>Param9</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>0</SecondExpression>
        </MainBody>
        <operator>AND</operator>
        <MainBody>
          <FirstExpression>Param10</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>1</SecondExpression>
        </MainBody>
      </group>
      <operator>OR</operator>
      <group>
        <MainBody>
          <FirstExpression>Param11</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>0</SecondExpression>
        </MainBody>
        <operator>AND</operator>
        <MainBody>
          <FirstExpression>Param12</FirstExpression>
          <Operator>=</Operator>
          <SecondExpression>1</SecondExpression>
        </MainBody>
      </group>
    </group>
  </group>
</expression>

因此,就目前而言,您絕對處於正確的軌道上,但是我認為,我們必須將結果名稱修改為內部未記錄的expr成員變量,這確實是一個危險信號,事實很可能是,您很快將達到您可以使用operatorPrecedence的極限。

您可能必須實現自己的遞歸解析器,才能完全控制所有元素和子元素的命名方式。 您甚至可能需要實現自己的asXML()版本來控制是否獲得中間級別,例如上面顯示的<group>標記。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM