[英]Resolving shift/reduce conflicts with PLY
I have the following grammar for the setlx language in PLY: 对于PLY中的setlx语言 ,我有以下语法:
Rule 0 S' -> file_input
Rule 1 file_input -> statement_list
Rule 2 epsilon -> <empty>
Rule 3 statement_list -> statement
Rule 4 statement_list -> statement_list statement
Rule 5 statement -> simple_statement SEMICOLON
Rule 6 statement -> compound_statement
Rule 7 simple_statement -> expression_statement
Rule 8 simple_statement -> assert_statement
Rule 9 simple_statement -> assignment_statement
Rule 10 simple_statement -> augmented_assign_statement
Rule 11 simple_statement -> backtrack_statement
Rule 12 simple_statement -> break_statement
Rule 13 simple_statement -> continue_statement
Rule 14 simple_statement -> exit_statement
Rule 15 simple_statement -> return_statement
Rule 16 simple_statement -> quantor
Rule 17 simple_statement -> term
Rule 18 expression_statement -> expression
Rule 19 backtrack_statement -> BACKTRACK
Rule 20 break_statement -> BREAK
Rule 21 continue_statement -> CONTINUE
Rule 22 exit_statement -> EXIT
Rule 23 return_statement -> RETURN
Rule 24 return_statement -> RETURN expression
Rule 25 expression_list -> expression
Rule 26 expression_list -> expression_list COMMA expression
Rule 27 expression -> implication
Rule 28 expression -> lambda_definition
Rule 29 expression -> implication EQUIVALENT implication
Rule 30 expression -> implication ANTIVALENT implication
Rule 31 implication -> disjunction
Rule 32 implication -> disjunction IMPLICATES disjunction
Rule 33 disjunction -> conjunction
Rule 34 disjunction -> disjunction OR conjunction
Rule 35 conjunction -> comparison
Rule 36 conjunction -> conjunction AND comparison
Rule 37 comparison -> sum
Rule 38 comparison -> sum EQ sum
Rule 39 comparison -> sum NEQ sum
Rule 40 comparison -> sum LT sum
Rule 41 comparison -> sum LE sum
Rule 42 comparison -> sum GT sum
Rule 43 comparison -> sum GE sum
Rule 44 comparison -> sum IN sum
Rule 45 comparison -> sum NOTIN sum
Rule 46 sum -> product
Rule 47 sum -> sum PLUS product
Rule 48 sum -> sum MINUS product
Rule 49 product -> reduce
Rule 50 product -> product TIMES reduce
Rule 51 product -> product DIVIDE reduce
Rule 52 product -> product IDIVIDE reduce
Rule 53 product -> product MOD reduce
Rule 54 product -> product CARTESIAN reduce
Rule 55 reduce -> unary_expression
Rule 56 reduce -> reduce SUM unary_expression
Rule 57 reduce -> reduce PRODUCT unary_expression
Rule 58 unary_expression -> power
Rule 59 unary_expression -> SUM unary_expression
Rule 60 unary_expression -> PRODUCT unary_expression
Rule 61 unary_expression -> HASH unary_expression
Rule 62 unary_expression -> MINUS unary_expression
Rule 63 unary_expression -> AT unary_expression
Rule 64 unary_expression -> BANG unary_expression
Rule 65 power -> primary
Rule 66 power -> primary POW unary_expression
Rule 67 primary -> atom
Rule 68 primary -> attributeref
Rule 69 primary -> subscription
Rule 70 primary -> slicing
Rule 71 primary -> procedure
Rule 72 primary -> call
Rule 73 primary -> primary BANG
Rule 74 atom -> identifier
Rule 75 atom -> literal
Rule 76 atom -> enclosure
Rule 77 identifier -> IDENTIFIER
Rule 78 identifier -> UNUSED
Rule 79 attributeref -> primary DOT identifier
Rule 80 subscription -> primary LBRACKET expression RBRACKET
Rule 81 slicing -> primary LBRACKET lower_bound RANGE upper_bound RBRACKET
Rule 82 lower_bound -> expression
Rule 83 lower_bound -> epsilon
Rule 84 upper_bound -> expression
Rule 85 upper_bound -> epsilon
Rule 86 literal -> stringliteral
Rule 87 literal -> integer
Rule 88 literal -> floatnumber
Rule 89 literal -> boolean
Rule 90 stringliteral -> STRING
Rule 91 stringliteral -> LITERAL
Rule 92 integer -> INTEGER
Rule 93 floatnumber -> DOUBLE
Rule 94 boolean -> TRUE
Rule 95 boolean -> FALSE
Rule 96 enclosure -> parenth_form
Rule 97 enclosure -> set_display
Rule 98 enclosure -> list_display
Rule 99 parenth_form -> LPAREN expression RPAREN
Rule 100 set_display -> LBRACE expression RANGE expression RBRACE
Rule 101 set_display -> LBRACE expression COMMA expression RANGE expression RBRACE
Rule 102 set_display -> LPAREN argument_list RPAREN
Rule 103 list_display -> LBRACKET expression RANGE expression RBRACKET
Rule 104 list_display -> LBRACKET expression COMMA expression RANGE expression RBRACKET
Rule 105 list_display -> LBRACKET argument_list RBRACKET
Rule 106 lambda_definition -> lambda_parameters LAMBDADEF expression
Rule 107 lambda_parameters -> identifier
Rule 108 lambda_parameters -> LT parameter_list GT
Rule 109 assignment_statement -> target ASSIGN expression
Rule 110 target -> expression
Rule 111 augmented_assign_statement -> augtarget augop expression
Rule 112 augtarget -> identifier
Rule 113 augtarget -> attributeref
Rule 114 augtarget -> subscription
Rule 115 augop -> PLUS_EQUAL
Rule 116 augop -> MINUS_EQUAL
Rule 117 augop -> TIMES_EQUAL
Rule 118 augop -> DIVIDE_EQUAL
Rule 119 augop -> IDIVIDE_EQUAL
Rule 120 augop -> MOD_EQUAL
Rule 121 assert_statement -> ASSERT LPAREN expression COMMA expression RPAREN
Rule 122 term -> TERM LPAREN term_arguments RPAREN
Rule 123 term_arguments -> expression_list
Rule 124 term_arguments -> epsilon
Rule 125 procedure -> PROCEDURE LPAREN parameter_list RPAREN LBRACE block RBRACE
Rule 126 procedure -> CPROCEDURE LPAREN parameter_list RPAREN LBRACE block RBRACE
Rule 127 parameter_list -> procedure_param
Rule 128 parameter_list -> parameter_list COMMA procedure_param
Rule 129 parameter_list -> epsilon
Rule 130 procedure_param -> identifier
Rule 131 call -> primary LPAREN argument_list RPAREN
Rule 132 call -> primary LPAREN RPAREN
Rule 133 argument_list -> expression
Rule 134 argument_list -> argument_list COMMA expression
Rule 135 quantor -> FORALL LPAREN iterator_chain PIPE expression RPAREN
Rule 136 quantor -> EXISTS LPAREN iterator_chain PIPE expression RPAREN
Rule 137 iterator -> target IN expression
Rule 138 iterator_chain -> iterator
Rule 139 iterator_chain -> iterator_chain COMMA iterator
Rule 140 compound_statement -> if_statement
Rule 141 compound_statement -> switch_statement
Rule 142 compound_statement -> match_statement
Rule 143 compound_statement -> while_loop
Rule 144 compound_statement -> do_while_loop
Rule 145 compound_statement -> for_loop
Rule 146 block -> statement_list
Rule 147 block -> epsilon
Rule 148 if_statement -> IF LPAREN expression RPAREN LBRACE block RBRACE
Rule 149 if_statement -> IF LPAREN expression RPAREN LBRACE block RBRACE ELSE LBRACE block RBRACE
Rule 150 if_statement -> IF LPAREN expression RPAREN LBRACE block RBRACE ELSE if_statement
Rule 151 switch_statement -> SWITCH LBRACE case_statements default_case RBRACE
Rule 152 case_statements -> case_list
Rule 153 case_statements -> epsilon
Rule 154 case_list -> case_statement
Rule 155 case_list -> case_list case_statement
Rule 156 case_statement -> CASE expression COLON block
Rule 157 default_case -> DEFAULT COLON block
Rule 158 default_case -> epsilon
Rule 159 match_statement -> MATCH
Rule 160 while_loop -> WHILE LPAREN expression RPAREN LBRACE block RBRACE
Rule 161 do_while_loop -> DO LBRACE block RBRACE WHILE LPAREN expression RPAREN SEMICOLON
Rule 162 for_loop -> FOR LPAREN iterator_chain RPAREN LBRACE block RBRACE
On the last few meters, I now get some conflicts: 在最后几米处,我现在遇到一些冲突:
WARNING:
WARNING: Conflicts:
WARNING:
WARNING: shift/reduce conflict for IN in state 34 resolved as shift
WARNING: shift/reduce conflict for COMMA in state 94 resolved as shift
WARNING: shift/reduce conflict for RPAREN in state 154 resolved as shift
How can I resolve them without generating new conflicts? 我如何解决它们而不产生新的冲突? I understand where they come from, but I have no idea about fixing it. 我知道它们的来源,但是我不知道要解决它。 Any help or general advice is appriciated. 任何帮助或一般性建议均适用。
I'll do these backwards, because that way we go from easiest to hardest. 我会向后进行这些操作,因为那样一来,我们就会从最简单的工作变成最困难的工作。 In fact, I don't really have a solution for the first conflict. 实际上,对于第一个冲突,我真的没有解决方案。
The third conflict is the result of an actual ambiguity in the grammar. 第三个冲突是语法中实际模棱两可的结果。 You need to get rid of the ambiguity: 您需要摆脱歧义:
Rule 96 enclosure -> parenth_form
Rule 97 enclosure -> set_display
Rule 99 parenth_form -> LPAREN expression RPAREN
Rule 102 set_display -> LPAREN argument_list RPAREN
Rule 133 argument_list -> expression
Consequently, if we're looking for an enclosure
and we find a simple parenthesized expression, it could either be a parenth_form
or it could be a set_display
containing an argument_list
of exactly one expression. 因此,如果我们要寻找一个enclosure
,并且找到一个简单的带括号的表达式,则它可以是parenth_form
,也可以是set_display
其中set_display
包含一个表达式的argument_list
。 I suspect that the intention here is that a simple parenthesized expression would be a parenth_form
, but there's no way to tell from the grammar. 我怀疑这里的意图是,用一个简单的括号括起来的表达式将是一个parenth_form
,但是没有办法从语法中分辨出来。
The simplest solution would be to just get rid of parenth_form
altogether, and check for the case of a one-element argument_list
when you build the AST node for a set_display
corresponding to rule 102. Another possibility is to be explicit about it; 最简单的解决方案是完全摆脱parenth_form
,并在为规则102对应的set_display
构建AST节点时检查单元素argument_list
的情况。 change Rule 102 to require the set_display
to have at least two expressions: 更改规则102以要求set_display
至少具有两个表达式:
set_display -> LPAREN expression COMMA argument_list RPAREN
That still requires you to juggle the AST, though, because you have to prepend the expression
to the argument_list
when you build the set_display
node. 但是,这仍然需要您处理AST,因为在构建set_display
节点时必须在expression
添加argument_list
到argument_list
。
The second S/R conflict is actually quite similar. 第二个S / R冲突实际上非常相似。 It arises because of: 出现此问题的原因是:
Rule 104 list_display -> LBRACKET expression COMMA expression RANGE expression RBRACKET
Rule 105 list_display -> LBRACKET argument_list RBRACKET
So: 所以:
LBRACKET expression COMMA expression ...
would require reduction by Rule 104 if the following symbol is RANGE
; 如果以下符号为RANGE
,将需要根据规则104进行减少; by Rule 105 if the following symbol is RBRACKET
; 如果以下符号是RBRACKET
, RBRACKET
规则105 RBRACKET
; and by Rule 134 if the following symbol is COMMA
. 如果以下符号为COMMA
规则134 COMMA
。 (That's a rough approximation, since it assumes that we already know the end of the second expression
.) As written, though, the grammar needs to commit to one of these paths as soon as it sees the first COMMA
, because it needs to decide at that moment whether to create an argument_list
or not. (这是一个粗略的近似值,因为它假定我们已经知道第二个expression
的结尾。)但是,如所写,语法需要在看到第一个COMMA
立即提交这些路径COMMA
,因为它需要确定那时是否要创建一个argument_list
。
The solution is to delay the parser's decision, which is easy but ugly: 解决方案是延迟解析器的决定,这很简单但是很丑陋:
list_display -> LBRACKET expression RANGE expression RBRACKET
list_display -> LBRACKET expression COMMA expression RANGE expression RBRACKET
list_display -> LBRACKET expression RBRACKET
list_display -> LBRACKET expression COMMA argument_list RBRACKET
Now, the first COMMA
is always shifted and the decision on what type of list_display
to reduce is delayed until the end of the second expression
(if there are two expression
s), but it's necessary to juggle the AST for the last two productions to correct the argument_list
. 现在,第一个COMMA
总是移位,并且要减少哪种类型的list_display
的决定被延迟到第二个expression
的末尾(如果有两个expression
),但是必须对AST进行调整,以使最后两个生成更正argument_list
。
The first S/R conflict arises because IN
is used both as an operator and as a syntactic part of an iterator
: 出现第一个S / R冲突是因为IN
既用作运算符又用作iterator
的句法部分:
Rule 44 comparison -> sum IN sum
Rule 137 iterator -> target IN expression
But because target
is just an expression
, and expression
can derive sum
, it's not possible (most of the time) for the parser to know which IN
it's looking at until much later in the parse. 但是因为target
只是一个expression
,并且expression
可以派生sum
,所以解析器(在大多数情况下)不可能知道它IN
看哪个IN
,直到后面的解析。
The previous technique of deferring the decision won't work here, because you need to know which type of IN
you're looking at in order to correctly apply operator precedence. 延迟决策的先前技术在这里不起作用,因为您需要知道您要查找哪种IN
类型才能正确应用运算符优先级。 Suppose we're in a context where we need an iterator
and the input is: 假设我们处于需要iterator
且输入为的上下文中:
atom1 AND atom2 IN atom3
If that's the iterator (ie, the next symbol is COMMA
or RPAREN
), then that is, effectively: 如果那是迭代器(即,下一个符号是COMMA
或RPAREN
),则实际上是:
( atom1 AND atom2 ) IN atom3
However, if that's the left-hand side of an iterator, then it needs to be parsed completely differently: 但是,如果这是迭代器的左侧,则需要完全不同地解析它:
( atom1 AND ( atom2 IN atom3 ) ) IN expression
Moreover, atom3
could have been an arbitrary expression, perhaps atom3 AND atom4
, leading to the two parses: 而且, atom3
可能是任意表达式,也许是atom3 AND atom4
,从而导致两个解析:
( atom1 AND atom2 ) IN ( atom3 AND atom4 )
( atom1 AND ( atom2 IN atom3 ) AND atom4 ) IN expression
This is why puns are bad in language design. 这就是双关语在语言设计上不好的原因。
I strongly suspect that there is no LR(k)
grammar which will be able to parse that particular corner of your language, although that's just based on intuition; 我强烈怀疑没有LR(k)
语法能够解析您语言的特定角落,尽管那只是基于直觉。 I have no proof. 我没有证据 However, a GLR parser would have no trouble with it, because it is not actually ambiguous. 但是,GLR解析器不会遇到任何麻烦,因为它实际上并不是模棱两可的。 I don't know if there is a GLR parser generator in Python; 我不知道Python中是否有GLR解析器生成器; if you're not tied to Python, you could certainly use bison
. 如果您不依赖Python,则可以使用bison
。
The GLR parser would also have solved the second conflict, which is also not the result of an ambiguity. GLR解析器还可以解决第二个冲突,这也不是模棱两可的结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.