使用Python Yacc \\ Lex作为公式解析器

Question

At the moment i'm working on using the python implementation of Yacc/Lex to build a formula parser for converting strings of formulae into a set of class defined operands. 目前，我正在使用Yacc / Lex的python实现来构建公式解析器，以将公式字符串转换为一组类定义的操作数。 So far i've been mostly successful but i've come to an empasse in defining the parsing rules due to ambiguity with parentheses and several shift/reduce errors. 到目前为止，我大多数情况下都取得了成功，但是由于括号中的模棱两可以及一些移位/减少错误，因此我在定义解析规则方面颇有建树。

The Backus Naur Form for the formulae ive been working on is 我一直在研究的公式的Backus Naur形式是

phi ::= p ; !p ; phi_0 & phi_1 ; phi_0 | phi_1 ; AX phi ; AF phi ; AG phi ; AU phi_0 U phi_1.

Also i've been trying to allow arbitrary matched parentheses but this is also where a lot of the confusion is coming from and i'm thinking where the shift reduce errors are coming from. 另外，我一直在尝试允许任意匹配的括号，但这也是造成很多混乱的地方，并且我在考虑减少移位错误的来源。 Its fairly necessary for the task i'm applying it to that parentheses are there to force specific evaluations on formulas, so I have to work that out. 对于将其应用到括号中以强制对公式进行特定评估的任务，这是非常必要的，因此我必须解决该问题。

Currently my parser is defined inside a class which builds its lexical analyser with 目前，我的解析器是在一个类中定义的，该类使用

            tokens = (
            'NEGATION',
            'FUTURE',
            'GLOBAL',
            'NEXT',
            'CONJUNCTION',
            'DISJUNCTION',
            'EQUIVALENCE',
            'IMPLICATION',
            'PROPOSITION',             
            'LPAREN',
            'RPAREN',
            'TRUE',
            'FALSE',
            )

        # regex in order of parsing precedence
        t_NEGATION    = r'[\s]*\![\s]*'
        t_FUTURE      = r'[\s]*AF[\s]*'
        t_GLOBAL      = r'[\s]*AG[\s]*'
        t_NEXT        = r'[\s]*AX[\s]*'
        t_CONJUNCTION = r'[\s]*\&[\s]*'
        t_DISJUNCTION = r'[\s]*\|[\s]*'
        t_EQUIVALENCE = r'[\s]*\<\-\>[\s]*'
        t_IMPLICATION = r'[\s]*[^<]\-\>[\s]*'
        t_LPAREN      = r'[\s]*\([\s]*'
        t_RPAREN      = r'[\s]*\)[\s]*'
        t_PROPOSITION = r'[\s]*[a-z]+[-\w\._]*[\s]*'
        t_TRUE        = r'[\s]*TRUE[\s]*'
        t_FALSE       = r'[\s]*FALSE[\s]*'

        precedence = (
        ('left', 'ASSIGNMENT'),
        ('left', 'NEGATION'),
        ('left', 'GLOBAL','NEXT','FUTURE'),
        ('left', 'CONJUNCTION'), 
        ('left', 'DISJUNCTION'), 
        ('left', 'EQUIVALENCE'),
        ('left', 'IMPLICATION'),   
        ('left', 'AUB', 'AUM'),
        ('left', 'LPAREN', 'RPAREN', 'TRUE', 'FALSE'),
        )            

        lexer = lex.lex()
        lexer.input(formula)

And the parsing rules as 和解析规则为

        def p_double_neg_paren(p):
            '''formula : NEGATION LPAREN NEGATION LPAREN PROPOSITION RPAREN RPAREN
            '''
            stack.append(p[5].strip())

        def p_double_neg(p):
            '''formula : NEGATION NEGATION PROPOSITION
            '''
            stack.append(p[3].strip())

        def p_double_neg_inner_paren(p):
            '''formula : NEGATION NEGATION LPAREN PROPOSITION RPAREN
            '''
            stack.append(p[4].strip())

        def p_double_neg_mid_paren(p):
            '''formula : NEGATION LPAREN NEGATION PROPOSITION RPAREN
            '''
            stack.append(p[4].strip())

        def p_groupAssignment(p):
            '''formula : PROPOSITION ASSIGNMENT ASSIGNVAL
            '''
            stack.append(p[1].strip() + p[2].strip() + p[3].strip())

        def p_neg_paren_take_outer_token(p):
            '''formula : NEGATION LPAREN PROPOSITION RPAREN
                       | NEGATION LPAREN TRUE RPAREN
                       | NEGATION LPAREN FALSE RPAREN
            '''
            stack.append(Neg(p[3]))

        def p_neg_take_outer_token(p):
            '''formula : NEGATION PROPOSITION
                       | NEGATION TRUE
                       | NEGATION FALSE
            '''

            stack.append(Neg(p[2].strip()))

        def p_neg_take_outer_token_paren(p):
            '''formula : LPAREN NEGATION PROPOSITION RPAREN
                       | LPAREN NEGATION TRUE RPAREN
                       | LPAREN NEGATION FALSE RPAREN
            '''
            stack.append(Neg(p[3].strip()))

        def p_unary_paren_nest_take_outer_token(p):
            '''formula : GLOBAL LPAREN LPAREN NEGATION formula RPAREN RPAREN
                       | NEXT LPAREN LPAREN NEGATION formula RPAREN RPAREN
                       | FUTURE LPAREN LPAREN NEGATION formula RPAREN RPAREN
            '''
            if len(stack) >= 1:
                if p[1].strip() == 'AG':
                    stack.append(['AG', ['!', stack.pop()]])
                elif p[1].strip() == 'AF':
                    stack.append(['AF', ['!', stack.pop()]])
                elif p[1].strip() == 'AX':    
                    stack.append(['AX', ['!', stack.pop()]])


        def p_unary_paren_take_outer_token(p):
            '''formula : GLOBAL LPAREN formula RPAREN
                       | NEXT LPAREN formula RPAREN
                       | FUTURE LPAREN formula RPAREN
            '''
            if len(stack) >= 1:
                if p[1].strip() == "AG":
                    stack.append(AG(stack.pop()))
                elif p[1].strip() == "AF":
                    stack.append(AF(stack.pop()))
                elif p[1].strip() == "AX":
                    stack.append(AX(stack.pop()))

        def p_unary_take_outer_token(p):
            '''formula : GLOBAL formula
                       | NEXT formula
                       | FUTURE formula
            '''

            if len(stack) >= 1:
                if p[1].strip() == "AG":
                    stack.append(AG(stack.pop()))
                elif p[1].strip() == "AF":
                    stack.append(AF(stack.pop()))
                elif p[1].strip() == "AX":
                    stack.append(AX(stack.pop()))

        def p_unary_take_outer_token_prop(p):
            '''formula : GLOBAL PROPOSITION
                       | NEXT PROPOSITION
                       | FUTURE PROPOSITION
            '''

            if len(stack) >= 1:
                if p[1].strip() == "AG":
                    stack.append(AG(stack.pop()))
                elif p[1].strip() == "AF":
                    stack.append(AF(stack.pop()))
                elif p[1].strip() == "AX":
                    stack.append(AX(stack.pop()))

        def p_binary_take_outer_token(p):
            '''formula : formula CONJUNCTION formula 
                       | formula DISJUNCTION formula 
                       | formula EQUIVALENCE formula
                       | formula IMPLICATION formula
            '''

            if len(stack) >= 2:
                a, b = stack.pop(), stack.pop()
                if self.IMPLICATION.search(p[2].strip()) and not self.EQUIVALENCE.search(p[2].strip()):
                    stack.append(Or(a, Neg(b)))
                elif self.EQUIVALENCE.search(p[2].strip()):
                    stack.append(And(Or(Neg(a), b), Or(Neg(b), a)))
                else:
                    if p[2].strip() == "|":
                        stack.append(Or(b, a))
                    elif p[2].strip() == "&":
                        stack.append(And(b, a))

        def p_binary_paren_take_outer_token(p):
            '''formula : LPAREN formula RPAREN CONJUNCTION LPAREN formula RPAREN 
                       | LPAREN formula RPAREN DISJUNCTION LPAREN formula RPAREN
                       | LPAREN formula RPAREN EQUIVALENCE LPAREN formula RPAREN 
                       | LPAREN formula RPAREN IMPLICATION LPAREN formula RPAREN
            '''
            if len(stack) >= 2:
                a, b = stack.pop(), stack.pop()
                if self.IMPLICATION.search(p[4].strip()) and not self.EQUIVALENCE.search(p[4].strip()):
                    stack.append(Or(a, Neg(b)))
                elif self.EQUIVALENCE.search(p[4].strip()):
                    stack.append(And(Or(Neg(a), b), Or(Neg(b), a)))
                else:
                    if p[4].strip() == "|":
                        stack.append(Or(b, a))
                    elif p[4].strip() == "&":
                        stack.append(And(b, a))

        def p_binary_lparen_take_outer_token(p):
            '''formula : LPAREN formula RPAREN CONJUNCTION formula 
                       | LPAREN formula RPAREN DISJUNCTION formula
                       | LPAREN formula RPAREN EQUIVALENCE formula 
                       | LPAREN formula RPAREN IMPLICATION formula
            '''
            if len(stack) >= 2:
                a = stack.pop()
                b = stack.pop()
                if self.IMPLICATION.search(p[4].strip()) and not self.EQUIVALENCE.search(p[4].strip()):
                    stack.append(Or(a, Neg(b)))
                elif self.EQUIVALENCE.search(p[4].strip()):
                    stack.append(And(Or(Neg(a), b), Or(Neg(b), a)))
                else:
                    if p[4].strip() == "|":
                        stack.append(Or(b, a))
                    elif p[4].strip() == "&":
                        stack.append(And(b, a))

        def p_binary_rparen_take_outer_token(p):
            '''formula : formula CONJUNCTION LPAREN formula RPAREN 
                       | formula DISJUNCTION LPAREN formula RPAREN
                       | formula EQUIVALENCE LPAREN formula RPAREN 
                       | formula IMPLICATION LPAREN formula RPAREN
            '''
            if len(stack) >= 2:
                a = stack.pop()
                b = stack.pop()
                if self.IMPLICATION.search(p[4].strip()) and not self.EQUIVALENCE.search(p[4].strip()):
                    stack.append(Or(a, Neg(b)))
                elif self.EQUIVALENCE.search(p[4].strip()):
                    stack.append(And(Or(Neg(a), b), Or(Neg(b), a)))
                else:
                    if p[4].strip() == "|":
                        stack.append(Or(b, a))
                    elif p[4].strip() == "&":
                        stack.append(And(b, a))

        def p_proposition_take_token_paren(p):
            '''formula : LPAREN formula RPAREN
            '''
            stack.append(p[2].strip())

        def p_proposition_take_token_atom(p):
            '''formula : LPAREN PROPOSITION RPAREN
            '''
            stack.append(p[2].strip())

        def p_proposition_take_token(p):
            '''formula : PROPOSITION
            '''
            stack.append(p[1].strip())

        def p_true_take_token(p):
            '''formula : TRUE
                       '''
            stack.append(p[1].strip())

        def p_false_take_token(p):
            '''formula : FALSE
                       '''
            stack.append(p[1].strip())

        # Error rule for syntax errors
        def p_error(p):
            print "Syntax error in input!: " + str(p)
            os.system("pause")
            return 0

I can see the lex\\yacc rules are fairly messy, i've removed much of the debugging code in each rule for brevity and tidiness but can anyone see where i'm going wrong here? 我可以看到lex \\ yacc规则相当凌乱，为了简洁和整洁，我已删除了每个规则中的许多调试代码，但是有人可以看到我在这里出错了吗？ Should I move handling of parentheses to another method or can it be done with what I have now? 我应该将括号的处理移至另一种方法还是可以用现在的方法完成？ Is there some other way I can process these formula strings into the predefined class operations without getting all the shift/reduce errors? 还有其他一些方法可以将这些公式字符串处理为预定义的类操作，而不会出现所有的shift / reduce错误？

Sorry for airing all my dirty code online but I could really use some help on something thats been bugging me for months. 对不起，我在网上发布了所有肮脏的代码，但是我确实可以在困扰了我几个月的事情上使用一些帮助。 Thanks. 谢谢。

Answer 1

START SMALL!!! 从小开始！ Regardless of the parser library you end up using, try doing just a simple binary operation like expr & expr, and get that working. 无论您最终使用的是哪种解析器库，都请尝试仅执行一个简单的二进制操作（例如expr＆expr），然后使其正常工作。 Then add support for '|'. 然后添加对“ |”的支持。 Now you have two different operators, and you have enough to represent precedence of operations, and parentheses actually play a part. 现在，您有两个不同的运算符，并且您有足够的空间来表示运算的优先级，并且括号实际上起了作用。 This BNF would look something like: 这个BNF看起来像：

atom := TRUE | FALSE | '(' expr ')'
and_op := atom '&' atom
or_op := and_op '|' and_op
expr = or_op

Do you see how this works? 你知道这是如何工作的吗？ No explicit "take left paren", "pop right paren" stuff. 没有明确的“带左括号”，“弹出右括号”之类的东西。 Figure out what your precedence of your other operations is going to be, and then extend this recursive infix notation parser to reflect them. 确定其他操作的优先级，然后扩展此递归中缀符号解析器以反映它们。 But DON'T DO ANYTHING ELSE until you get this minimal bit working first. 但是，除非您先使这一点工作，否则别无所求。 Otherwise, you are just trying to solve too much at once. 否则，您只是想立即解决太多问题。

Answer 2

Parsers are frustrating, and your notation is a non-trivial one. 解析器是令人沮丧的，你的符号是一个不平凡的一个。 Creating a parser for infix notation takes a certain mind set. 为中缀符号创建解析器需要一定的思路。 But if this is a core part of your system, you'll have to get it working some time. 但是，如果这是系统的核心部分，则必须使它工作一段时间。 I'm not sure what was your confusion with lepl, I believe it is fairly similar in concept to pyparsing. 我不确定您对lepl的困惑是什么，我相信它在概念上与pyparsing非常相似。 In the spirit of SO, maybe I can post a pyparsing starter for you. 本着SO的精神，也许我可以为您发布pyparsing入门工具。

Your BNF didn't really match your lexing code, as your code includes references to '<->', and '->' operators, an assignment statement, and a proposition which I assume is basically a lowercase identifier. 您的BNF与您的词法编码实际上不匹配，因为您的代码包括对'<->'和'->'运算符的引用，一个赋值语句以及一个我认为基本上是小写字母的命题。 I looked for an online reference for this language, but didn't find one. 我在网上寻找了这种语言的参考资料，但没有找到。 Also, you didn't post any test cases. 另外，您没有发布任何测试用例。 So I took a best guess at what your language BNF is supposed to be. 因此，我对您的BNF语言应该是一个最好的猜测。

"""
phi ::= p 
        !p 
        phi_0 & phi_1 
        phi_0 | phi_1 
        AX phi 
        AF phi 
        AG phi 
        AU phi_0 U phi_1
"""

from pyparsing import *

LPAR,RPAR = map(Suppress,"()")
NOT = Literal("!")
TRUE = Keyword("TRUE")
FALSE = Keyword("FALSE")
AX, AF, AG, AU, U = map(Keyword, "AX AF AG AU U".split())
AND_OP = "&"
OR_OP = "|"
ident = Word(alphas.lower())

phi = Forward()
p = Optional(NOT) + (TRUE | FALSE | ident | Group(LPAR + phi + RPAR) )
binand = p + ZeroOrMore(AND_OP + p)
binor = binand + ZeroOrMore(OR_OP + binand)
phi << (
    Group(AX + phi) |
    Group(AF + phi) |
    Group(AG + phi) |
    Group(AU + phi + U + phi) |
    binor)

assign = ident + "=" + Group(phi)
equiv = Group(phi) + "<->" + Group(phi)
implicate = Group(phi) + "->" + Group(phi)

statement = assign | equiv | implicate

tests = """\
a=TRUE
b = FALSE
c = !TRUE
d <-> b & !c
AG b & d -> !e""".splitlines()

for t in tests:
    print statement.parseString(t).asList()

Prints: 打印：

['a', '=', ['TRUE']]
['b', '=', ['FALSE']]
['c', '=', ['!', 'TRUE']]
[['d'], '<->', ['b', '&', '!', 'c']]
[[['AG', 'b', '&', 'd']], '->', ['!', 'e']]

The Group classes help structure the results into a quasi-AST. 组类有助于将结果构造为准AST。 There are a number of examples on the pyparsing wiki that will help you take it from here. pyparsing Wiki上有许多示例，可以帮助您从这里开始学习。 I would recommend looking at the simpleBool.py example on how to have the parser produce an evaluator. 我建议您看一下simpleBool.py示例，以了解如何使解析器生成评估器。

使用Python Yacc \\ Lex作为公式解析器

问题描述

2 个解决方案

解决方案1
2 2009-12-08 01:03:21

解决方案2
2 2009-12-14 03:55:38

使用Python Yacc \\ Lex作为公式解析器

问题描述

2 个解决方案

解决方案1 2 2009-12-08 01:03:21

解决方案2 2 2009-12-14 03:55:38

解决方案1
2 2009-12-08 01:03:21

解决方案2
2 2009-12-14 03:55:38