简体   繁体   English

解决移位/减少与PLY的冲突

[英]Resolving shift/reduce conflicts with PLY

I have the following grammar for the setlx language in PLY: 对于PLY中的setlx语言 ,我有以下语法:

Rule 0     S' -> file_input
Rule 1     file_input -> statement_list
Rule 2     epsilon -> <empty>
Rule 3     statement_list -> statement
Rule 4     statement_list -> statement_list statement
Rule 5     statement -> simple_statement SEMICOLON
Rule 6     statement -> compound_statement
Rule 7     simple_statement -> expression_statement
Rule 8     simple_statement -> assert_statement
Rule 9     simple_statement -> assignment_statement
Rule 10    simple_statement -> augmented_assign_statement
Rule 11    simple_statement -> backtrack_statement
Rule 12    simple_statement -> break_statement
Rule 13    simple_statement -> continue_statement
Rule 14    simple_statement -> exit_statement
Rule 15    simple_statement -> return_statement
Rule 16    simple_statement -> quantor
Rule 17    simple_statement -> term
Rule 18    expression_statement -> expression
Rule 19    backtrack_statement -> BACKTRACK
Rule 20    break_statement -> BREAK
Rule 21    continue_statement -> CONTINUE
Rule 22    exit_statement -> EXIT
Rule 23    return_statement -> RETURN
Rule 24    return_statement -> RETURN expression
Rule 25    expression_list -> expression
Rule 26    expression_list -> expression_list COMMA expression
Rule 27    expression -> implication
Rule 28    expression -> lambda_definition
Rule 29    expression -> implication EQUIVALENT implication
Rule 30    expression -> implication ANTIVALENT implication
Rule 31    implication -> disjunction
Rule 32    implication -> disjunction IMPLICATES disjunction
Rule 33    disjunction -> conjunction
Rule 34    disjunction -> disjunction OR conjunction
Rule 35    conjunction -> comparison
Rule 36    conjunction -> conjunction AND comparison
Rule 37    comparison -> sum
Rule 38    comparison -> sum EQ sum
Rule 39    comparison -> sum NEQ sum
Rule 40    comparison -> sum LT sum
Rule 41    comparison -> sum LE sum
Rule 42    comparison -> sum GT sum
Rule 43    comparison -> sum GE sum
Rule 44    comparison -> sum IN sum
Rule 45    comparison -> sum NOTIN sum
Rule 46    sum -> product
Rule 47    sum -> sum PLUS product
Rule 48    sum -> sum MINUS product
Rule 49    product -> reduce
Rule 50    product -> product TIMES reduce
Rule 51    product -> product DIVIDE reduce
Rule 52    product -> product IDIVIDE reduce
Rule 53    product -> product MOD reduce
Rule 54    product -> product CARTESIAN reduce
Rule 55    reduce -> unary_expression
Rule 56    reduce -> reduce SUM unary_expression
Rule 57    reduce -> reduce PRODUCT unary_expression
Rule 58    unary_expression -> power
Rule 59    unary_expression -> SUM unary_expression
Rule 60    unary_expression -> PRODUCT unary_expression
Rule 61    unary_expression -> HASH unary_expression
Rule 62    unary_expression -> MINUS unary_expression
Rule 63    unary_expression -> AT unary_expression
Rule 64    unary_expression -> BANG unary_expression
Rule 65    power -> primary
Rule 66    power -> primary POW unary_expression
Rule 67    primary -> atom
Rule 68    primary -> attributeref
Rule 69    primary -> subscription
Rule 70    primary -> slicing
Rule 71    primary -> procedure
Rule 72    primary -> call
Rule 73    primary -> primary BANG
Rule 74    atom -> identifier
Rule 75    atom -> literal
Rule 76    atom -> enclosure
Rule 77    identifier -> IDENTIFIER
Rule 78    identifier -> UNUSED
Rule 79    attributeref -> primary DOT identifier
Rule 80    subscription -> primary LBRACKET expression RBRACKET
Rule 81    slicing -> primary LBRACKET lower_bound RANGE upper_bound RBRACKET
Rule 82    lower_bound -> expression
Rule 83    lower_bound -> epsilon
Rule 84    upper_bound -> expression
Rule 85    upper_bound -> epsilon
Rule 86    literal -> stringliteral
Rule 87    literal -> integer
Rule 88    literal -> floatnumber
Rule 89    literal -> boolean
Rule 90    stringliteral -> STRING
Rule 91    stringliteral -> LITERAL
Rule 92    integer -> INTEGER
Rule 93    floatnumber -> DOUBLE
Rule 94    boolean -> TRUE
Rule 95    boolean -> FALSE
Rule 96    enclosure -> parenth_form
Rule 97    enclosure -> set_display
Rule 98    enclosure -> list_display
Rule 99    parenth_form -> LPAREN expression RPAREN
Rule 100   set_display -> LBRACE expression RANGE expression RBRACE
Rule 101   set_display -> LBRACE expression COMMA expression RANGE expression RBRACE
Rule 102   set_display -> LPAREN argument_list RPAREN
Rule 103   list_display -> LBRACKET expression RANGE expression RBRACKET
Rule 104   list_display -> LBRACKET expression COMMA expression RANGE expression RBRACKET
Rule 105   list_display -> LBRACKET argument_list RBRACKET
Rule 106   lambda_definition -> lambda_parameters LAMBDADEF expression
Rule 107   lambda_parameters -> identifier
Rule 108   lambda_parameters -> LT parameter_list GT
Rule 109   assignment_statement -> target ASSIGN expression
Rule 110   target -> expression
Rule 111   augmented_assign_statement -> augtarget augop expression
Rule 112   augtarget -> identifier
Rule 113   augtarget -> attributeref
Rule 114   augtarget -> subscription
Rule 115   augop -> PLUS_EQUAL
Rule 116   augop -> MINUS_EQUAL
Rule 117   augop -> TIMES_EQUAL
Rule 118   augop -> DIVIDE_EQUAL
Rule 119   augop -> IDIVIDE_EQUAL
Rule 120   augop -> MOD_EQUAL
Rule 121   assert_statement -> ASSERT LPAREN expression COMMA expression RPAREN
Rule 122   term -> TERM LPAREN term_arguments RPAREN
Rule 123   term_arguments -> expression_list
Rule 124   term_arguments -> epsilon
Rule 125   procedure -> PROCEDURE LPAREN parameter_list RPAREN LBRACE block RBRACE
Rule 126   procedure -> CPROCEDURE LPAREN parameter_list RPAREN LBRACE block RBRACE
Rule 127   parameter_list -> procedure_param
Rule 128   parameter_list -> parameter_list COMMA procedure_param
Rule 129   parameter_list -> epsilon
Rule 130   procedure_param -> identifier
Rule 131   call -> primary LPAREN argument_list RPAREN
Rule 132   call -> primary LPAREN RPAREN
Rule 133   argument_list -> expression
Rule 134   argument_list -> argument_list COMMA expression
Rule 135   quantor -> FORALL LPAREN iterator_chain PIPE expression RPAREN
Rule 136   quantor -> EXISTS LPAREN iterator_chain PIPE expression RPAREN
Rule 137   iterator -> target IN expression
Rule 138   iterator_chain -> iterator
Rule 139   iterator_chain -> iterator_chain COMMA iterator
Rule 140   compound_statement -> if_statement
Rule 141   compound_statement -> switch_statement
Rule 142   compound_statement -> match_statement
Rule 143   compound_statement -> while_loop
Rule 144   compound_statement -> do_while_loop
Rule 145   compound_statement -> for_loop
Rule 146   block -> statement_list
Rule 147   block -> epsilon
Rule 148   if_statement -> IF LPAREN expression RPAREN LBRACE block RBRACE
Rule 149   if_statement -> IF LPAREN expression RPAREN LBRACE block RBRACE ELSE LBRACE block RBRACE
Rule 150   if_statement -> IF LPAREN expression RPAREN LBRACE block RBRACE ELSE if_statement
Rule 151   switch_statement -> SWITCH LBRACE case_statements default_case RBRACE
Rule 152   case_statements -> case_list
Rule 153   case_statements -> epsilon
Rule 154   case_list -> case_statement
Rule 155   case_list -> case_list case_statement
Rule 156   case_statement -> CASE expression COLON block
Rule 157   default_case -> DEFAULT COLON block
Rule 158   default_case -> epsilon
Rule 159   match_statement -> MATCH
Rule 160   while_loop -> WHILE LPAREN expression RPAREN LBRACE block RBRACE
Rule 161   do_while_loop -> DO LBRACE block RBRACE WHILE LPAREN expression RPAREN SEMICOLON
Rule 162   for_loop -> FOR LPAREN iterator_chain RPAREN LBRACE block RBRACE

On the last few meters, I now get some conflicts: 在最后几米处,我现在遇到一些冲突:

WARNING: 
WARNING: Conflicts:
WARNING: 
WARNING: shift/reduce conflict for IN in state 34 resolved as shift
WARNING: shift/reduce conflict for COMMA in state 94 resolved as shift
WARNING: shift/reduce conflict for RPAREN in state 154 resolved as shift

How can I resolve them without generating new conflicts? 我如何解决它们而不产生新的冲突? I understand where they come from, but I have no idea about fixing it. 我知道它们的来源,但是我不知道要解决它。 Any help or general advice is appriciated. 任何帮助或一般性建议均适用。

I'll do these backwards, because that way we go from easiest to hardest. 我会向后进行这些操作,因为那样一来,我们就会从最简单的工作变成最困难的工作。 In fact, I don't really have a solution for the first conflict. 实际上,对于第一个冲突,我真的没有解决方案。

The third conflict is the result of an actual ambiguity in the grammar. 第三个冲突是语法中实际模棱两可的结果。 You need to get rid of the ambiguity: 您需要摆脱歧义:

Rule 96    enclosure -> parenth_form
Rule 97    enclosure -> set_display
Rule 99    parenth_form -> LPAREN expression RPAREN
Rule 102   set_display -> LPAREN argument_list RPAREN
Rule 133   argument_list -> expression

Consequently, if we're looking for an enclosure and we find a simple parenthesized expression, it could either be a parenth_form or it could be a set_display containing an argument_list of exactly one expression. 因此,如果我们要寻找一个enclosure ,并且找到一个简单的带括号的表达式,则它可以是parenth_form ,也可以是set_display其中set_display包含一个表达式的argument_list I suspect that the intention here is that a simple parenthesized expression would be a parenth_form , but there's no way to tell from the grammar. 我怀疑这里的意图是,用一个简单的括号括起来的表达式将是一个parenth_form ,但是没有办法从语法中分辨出来。

The simplest solution would be to just get rid of parenth_form altogether, and check for the case of a one-element argument_list when you build the AST node for a set_display corresponding to rule 102. Another possibility is to be explicit about it; 最简单的解决方案是完全摆脱parenth_form ,并在为规则102对应的set_display构建AST节点时检查单元素argument_list的情况。 change Rule 102 to require the set_display to have at least two expressions: 更改规则102以要求set_display至少具有两个表达式:

set_display -> LPAREN expression COMMA argument_list RPAREN

That still requires you to juggle the AST, though, because you have to prepend the expression to the argument_list when you build the set_display node. 但是,这仍然需要您处理AST,因为在构建set_display节点时必须在expression添加argument_listargument_list

The second S/R conflict is actually quite similar. 第二个S / R冲突实际上非常相似。 It arises because of: 出现此问题的原因是:

Rule 104   list_display -> LBRACKET expression COMMA expression RANGE expression RBRACKET
Rule 105   list_display -> LBRACKET argument_list RBRACKET

So: 所以:

LBRACKET expression COMMA expression ...

would require reduction by Rule 104 if the following symbol is RANGE ; 如果以下符号为RANGE ,将需要根据规则104进行减少; by Rule 105 if the following symbol is RBRACKET ; 如果以下符号是RBRACKETRBRACKET规则105 RBRACKET ; and by Rule 134 if the following symbol is COMMA . 如果以下符号为COMMA规则134 COMMA (That's a rough approximation, since it assumes that we already know the end of the second expression .) As written, though, the grammar needs to commit to one of these paths as soon as it sees the first COMMA , because it needs to decide at that moment whether to create an argument_list or not. (这是一个粗略的近似值,因为它假定我们已经知道第二个expression的结尾。)但是,如所写,语法需要在看到第一个COMMA立即提交这些路径COMMA ,因为它需要确定那时是否要创建一个argument_list

The solution is to delay the parser's decision, which is easy but ugly: 解决方案是延迟解析器的决定,这很简单但是很丑陋:

list_display -> LBRACKET expression RANGE expression RBRACKET
list_display -> LBRACKET expression COMMA expression RANGE expression RBRACKET
list_display -> LBRACKET expression RBRACKET
list_display -> LBRACKET expression COMMA argument_list RBRACKET

Now, the first COMMA is always shifted and the decision on what type of list_display to reduce is delayed until the end of the second expression (if there are two expression s), but it's necessary to juggle the AST for the last two productions to correct the argument_list . 现在,第一个COMMA总是移位,并且要减少哪种类型的list_display的决定被延迟到第二个expression的末尾(如果有两个expression ),但是必须对AST进行调整,以使最后两个生成更正argument_list

The first S/R conflict arises because IN is used both as an operator and as a syntactic part of an iterator : 出现第一个S / R冲突是因为IN既用作运算符又用作iterator的句法部分:

Rule 44    comparison -> sum IN sum
Rule 137   iterator -> target IN expression

But because target is just an expression , and expression can derive sum , it's not possible (most of the time) for the parser to know which IN it's looking at until much later in the parse. 但是因为target只是一个expression ,并且expression可以派生sum ,所以解析器(在大多数情况下)不可能知道它IN看哪个IN ,直到后面的解析。

The previous technique of deferring the decision won't work here, because you need to know which type of IN you're looking at in order to correctly apply operator precedence. 延迟决策的先前技术在这里不起作用,因为您需要知道您要查找哪种IN类型才能正确应用运算符优先级。 Suppose we're in a context where we need an iterator and the input is: 假设我们处于需要iterator且输入为的上下文中:

atom1 AND atom2 IN atom3

If that's the iterator (ie, the next symbol is COMMA or RPAREN ), then that is, effectively: 如果那是迭代器(即,下一个符号是COMMARPAREN ),则实际上是:

( atom1 AND atom2 ) IN atom3

However, if that's the left-hand side of an iterator, then it needs to be parsed completely differently: 但是,如果这是迭代器的左侧,则需要完全不同地解析它:

( atom1 AND ( atom2 IN atom3 ) ) IN expression

Moreover, atom3 could have been an arbitrary expression, perhaps atom3 AND atom4 , leading to the two parses: 而且, atom3可能是任意表达式,也许是atom3 AND atom4 ,从而导致两个解析:

( atom1 AND atom2 ) IN ( atom3 AND atom4 )
( atom1 AND ( atom2 IN atom3 ) AND atom4 ) IN expression

This is why puns are bad in language design. 这就是双关语在语言设计上不好的原因。

I strongly suspect that there is no LR(k) grammar which will be able to parse that particular corner of your language, although that's just based on intuition; 我强烈怀疑没有LR(k)语法能够解析您语言的特定角落,尽管那只是基于直觉。 I have no proof. 我没有证据 However, a GLR parser would have no trouble with it, because it is not actually ambiguous. 但是,GLR解析器不会遇到任何麻烦,因为它实际上并不是模棱两可的。 I don't know if there is a GLR parser generator in Python; 我不知道Python中是否有GLR解析器生成器; if you're not tied to Python, you could certainly use bison . 如果您不依赖Python,则可以使用bison

The GLR parser would also have solved the second conflict, which is also not the result of an ambiguity. GLR解析器还可以解决第二个冲突,这也不是模棱两可的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM