[英]Resolving shift/reduce conflicts with PLY
對於PLY中的setlx語言 ,我有以下語法:
Rule 0 S' -> file_input
Rule 1 file_input -> statement_list
Rule 2 epsilon -> <empty>
Rule 3 statement_list -> statement
Rule 4 statement_list -> statement_list statement
Rule 5 statement -> simple_statement SEMICOLON
Rule 6 statement -> compound_statement
Rule 7 simple_statement -> expression_statement
Rule 8 simple_statement -> assert_statement
Rule 9 simple_statement -> assignment_statement
Rule 10 simple_statement -> augmented_assign_statement
Rule 11 simple_statement -> backtrack_statement
Rule 12 simple_statement -> break_statement
Rule 13 simple_statement -> continue_statement
Rule 14 simple_statement -> exit_statement
Rule 15 simple_statement -> return_statement
Rule 16 simple_statement -> quantor
Rule 17 simple_statement -> term
Rule 18 expression_statement -> expression
Rule 19 backtrack_statement -> BACKTRACK
Rule 20 break_statement -> BREAK
Rule 21 continue_statement -> CONTINUE
Rule 22 exit_statement -> EXIT
Rule 23 return_statement -> RETURN
Rule 24 return_statement -> RETURN expression
Rule 25 expression_list -> expression
Rule 26 expression_list -> expression_list COMMA expression
Rule 27 expression -> implication
Rule 28 expression -> lambda_definition
Rule 29 expression -> implication EQUIVALENT implication
Rule 30 expression -> implication ANTIVALENT implication
Rule 31 implication -> disjunction
Rule 32 implication -> disjunction IMPLICATES disjunction
Rule 33 disjunction -> conjunction
Rule 34 disjunction -> disjunction OR conjunction
Rule 35 conjunction -> comparison
Rule 36 conjunction -> conjunction AND comparison
Rule 37 comparison -> sum
Rule 38 comparison -> sum EQ sum
Rule 39 comparison -> sum NEQ sum
Rule 40 comparison -> sum LT sum
Rule 41 comparison -> sum LE sum
Rule 42 comparison -> sum GT sum
Rule 43 comparison -> sum GE sum
Rule 44 comparison -> sum IN sum
Rule 45 comparison -> sum NOTIN sum
Rule 46 sum -> product
Rule 47 sum -> sum PLUS product
Rule 48 sum -> sum MINUS product
Rule 49 product -> reduce
Rule 50 product -> product TIMES reduce
Rule 51 product -> product DIVIDE reduce
Rule 52 product -> product IDIVIDE reduce
Rule 53 product -> product MOD reduce
Rule 54 product -> product CARTESIAN reduce
Rule 55 reduce -> unary_expression
Rule 56 reduce -> reduce SUM unary_expression
Rule 57 reduce -> reduce PRODUCT unary_expression
Rule 58 unary_expression -> power
Rule 59 unary_expression -> SUM unary_expression
Rule 60 unary_expression -> PRODUCT unary_expression
Rule 61 unary_expression -> HASH unary_expression
Rule 62 unary_expression -> MINUS unary_expression
Rule 63 unary_expression -> AT unary_expression
Rule 64 unary_expression -> BANG unary_expression
Rule 65 power -> primary
Rule 66 power -> primary POW unary_expression
Rule 67 primary -> atom
Rule 68 primary -> attributeref
Rule 69 primary -> subscription
Rule 70 primary -> slicing
Rule 71 primary -> procedure
Rule 72 primary -> call
Rule 73 primary -> primary BANG
Rule 74 atom -> identifier
Rule 75 atom -> literal
Rule 76 atom -> enclosure
Rule 77 identifier -> IDENTIFIER
Rule 78 identifier -> UNUSED
Rule 79 attributeref -> primary DOT identifier
Rule 80 subscription -> primary LBRACKET expression RBRACKET
Rule 81 slicing -> primary LBRACKET lower_bound RANGE upper_bound RBRACKET
Rule 82 lower_bound -> expression
Rule 83 lower_bound -> epsilon
Rule 84 upper_bound -> expression
Rule 85 upper_bound -> epsilon
Rule 86 literal -> stringliteral
Rule 87 literal -> integer
Rule 88 literal -> floatnumber
Rule 89 literal -> boolean
Rule 90 stringliteral -> STRING
Rule 91 stringliteral -> LITERAL
Rule 92 integer -> INTEGER
Rule 93 floatnumber -> DOUBLE
Rule 94 boolean -> TRUE
Rule 95 boolean -> FALSE
Rule 96 enclosure -> parenth_form
Rule 97 enclosure -> set_display
Rule 98 enclosure -> list_display
Rule 99 parenth_form -> LPAREN expression RPAREN
Rule 100 set_display -> LBRACE expression RANGE expression RBRACE
Rule 101 set_display -> LBRACE expression COMMA expression RANGE expression RBRACE
Rule 102 set_display -> LPAREN argument_list RPAREN
Rule 103 list_display -> LBRACKET expression RANGE expression RBRACKET
Rule 104 list_display -> LBRACKET expression COMMA expression RANGE expression RBRACKET
Rule 105 list_display -> LBRACKET argument_list RBRACKET
Rule 106 lambda_definition -> lambda_parameters LAMBDADEF expression
Rule 107 lambda_parameters -> identifier
Rule 108 lambda_parameters -> LT parameter_list GT
Rule 109 assignment_statement -> target ASSIGN expression
Rule 110 target -> expression
Rule 111 augmented_assign_statement -> augtarget augop expression
Rule 112 augtarget -> identifier
Rule 113 augtarget -> attributeref
Rule 114 augtarget -> subscription
Rule 115 augop -> PLUS_EQUAL
Rule 116 augop -> MINUS_EQUAL
Rule 117 augop -> TIMES_EQUAL
Rule 118 augop -> DIVIDE_EQUAL
Rule 119 augop -> IDIVIDE_EQUAL
Rule 120 augop -> MOD_EQUAL
Rule 121 assert_statement -> ASSERT LPAREN expression COMMA expression RPAREN
Rule 122 term -> TERM LPAREN term_arguments RPAREN
Rule 123 term_arguments -> expression_list
Rule 124 term_arguments -> epsilon
Rule 125 procedure -> PROCEDURE LPAREN parameter_list RPAREN LBRACE block RBRACE
Rule 126 procedure -> CPROCEDURE LPAREN parameter_list RPAREN LBRACE block RBRACE
Rule 127 parameter_list -> procedure_param
Rule 128 parameter_list -> parameter_list COMMA procedure_param
Rule 129 parameter_list -> epsilon
Rule 130 procedure_param -> identifier
Rule 131 call -> primary LPAREN argument_list RPAREN
Rule 132 call -> primary LPAREN RPAREN
Rule 133 argument_list -> expression
Rule 134 argument_list -> argument_list COMMA expression
Rule 135 quantor -> FORALL LPAREN iterator_chain PIPE expression RPAREN
Rule 136 quantor -> EXISTS LPAREN iterator_chain PIPE expression RPAREN
Rule 137 iterator -> target IN expression
Rule 138 iterator_chain -> iterator
Rule 139 iterator_chain -> iterator_chain COMMA iterator
Rule 140 compound_statement -> if_statement
Rule 141 compound_statement -> switch_statement
Rule 142 compound_statement -> match_statement
Rule 143 compound_statement -> while_loop
Rule 144 compound_statement -> do_while_loop
Rule 145 compound_statement -> for_loop
Rule 146 block -> statement_list
Rule 147 block -> epsilon
Rule 148 if_statement -> IF LPAREN expression RPAREN LBRACE block RBRACE
Rule 149 if_statement -> IF LPAREN expression RPAREN LBRACE block RBRACE ELSE LBRACE block RBRACE
Rule 150 if_statement -> IF LPAREN expression RPAREN LBRACE block RBRACE ELSE if_statement
Rule 151 switch_statement -> SWITCH LBRACE case_statements default_case RBRACE
Rule 152 case_statements -> case_list
Rule 153 case_statements -> epsilon
Rule 154 case_list -> case_statement
Rule 155 case_list -> case_list case_statement
Rule 156 case_statement -> CASE expression COLON block
Rule 157 default_case -> DEFAULT COLON block
Rule 158 default_case -> epsilon
Rule 159 match_statement -> MATCH
Rule 160 while_loop -> WHILE LPAREN expression RPAREN LBRACE block RBRACE
Rule 161 do_while_loop -> DO LBRACE block RBRACE WHILE LPAREN expression RPAREN SEMICOLON
Rule 162 for_loop -> FOR LPAREN iterator_chain RPAREN LBRACE block RBRACE
在最后幾米處,我現在遇到一些沖突:
WARNING:
WARNING: Conflicts:
WARNING:
WARNING: shift/reduce conflict for IN in state 34 resolved as shift
WARNING: shift/reduce conflict for COMMA in state 94 resolved as shift
WARNING: shift/reduce conflict for RPAREN in state 154 resolved as shift
我如何解決它們而不產生新的沖突? 我知道它們的來源,但是我不知道要解決它。 任何幫助或一般性建議均適用。
我會向后進行這些操作,因為那樣一來,我們就會從最簡單的工作變成最困難的工作。 實際上,對於第一個沖突,我真的沒有解決方案。
第三個沖突是語法中實際模棱兩可的結果。 您需要擺脫歧義:
Rule 96 enclosure -> parenth_form
Rule 97 enclosure -> set_display
Rule 99 parenth_form -> LPAREN expression RPAREN
Rule 102 set_display -> LPAREN argument_list RPAREN
Rule 133 argument_list -> expression
因此,如果我們要尋找一個enclosure
,並且找到一個簡單的帶括號的表達式,則它可以是parenth_form
,也可以是set_display
其中set_display
包含一個表達式的argument_list
。 我懷疑這里的意圖是,用一個簡單的括號括起來的表達式將是一個parenth_form
,但是沒有辦法從語法中分辨出來。
最簡單的解決方案是完全擺脫parenth_form
,並在為規則102對應的set_display
構建AST節點時檢查單元素argument_list
的情況。 更改規則102以要求set_display
至少具有兩個表達式:
set_display -> LPAREN expression COMMA argument_list RPAREN
但是,這仍然需要您處理AST,因為在構建set_display
節點時必須在expression
添加argument_list
到argument_list
。
第二個S / R沖突實際上非常相似。 出現此問題的原因是:
Rule 104 list_display -> LBRACKET expression COMMA expression RANGE expression RBRACKET
Rule 105 list_display -> LBRACKET argument_list RBRACKET
所以:
LBRACKET expression COMMA expression ...
如果以下符號為RANGE
,將需要根據規則104進行減少; 如果以下符號是RBRACKET
, RBRACKET
規則105 RBRACKET
; 如果以下符號為COMMA
規則134 COMMA
。 (這是一個粗略的近似值,因為它假定我們已經知道第二個expression
的結尾。)但是,如所寫,語法需要在看到第一個COMMA
立即提交這些路徑COMMA
,因為它需要確定那時是否要創建一個argument_list
。
解決方案是延遲解析器的決定,這很簡單但是很丑陋:
list_display -> LBRACKET expression RANGE expression RBRACKET
list_display -> LBRACKET expression COMMA expression RANGE expression RBRACKET
list_display -> LBRACKET expression RBRACKET
list_display -> LBRACKET expression COMMA argument_list RBRACKET
現在,第一個COMMA
總是移位,並且要減少哪種類型的list_display
的決定被延遲到第二個expression
的末尾(如果有兩個expression
),但是必須對AST進行調整,以使最后兩個生成更正argument_list
。
出現第一個S / R沖突是因為IN
既用作運算符又用作iterator
的句法部分:
Rule 44 comparison -> sum IN sum
Rule 137 iterator -> target IN expression
但是因為target
只是一個expression
,並且expression
可以派生sum
,所以解析器(在大多數情況下)不可能知道它IN
看哪個IN
,直到后面的解析。
延遲決策的先前技術在這里不起作用,因為您需要知道您要查找哪種IN
類型才能正確應用運算符優先級。 假設我們處於需要iterator
且輸入為的上下文中:
atom1 AND atom2 IN atom3
如果那是迭代器(即,下一個符號是COMMA
或RPAREN
),則實際上是:
( atom1 AND atom2 ) IN atom3
但是,如果這是迭代器的左側,則需要完全不同地解析它:
( atom1 AND ( atom2 IN atom3 ) ) IN expression
而且, atom3
可能是任意表達式,也許是atom3 AND atom4
,從而導致兩個解析:
( atom1 AND atom2 ) IN ( atom3 AND atom4 )
( atom1 AND ( atom2 IN atom3 ) AND atom4 ) IN expression
這就是雙關語在語言設計上不好的原因。
我強烈懷疑沒有LR(k)
語法能夠解析您語言的特定角落,盡管那只是基於直覺。 我沒有證據 但是,GLR解析器不會遇到任何麻煩,因為它實際上並不是模棱兩可的。 我不知道Python中是否有GLR解析器生成器; 如果您不依賴Python,則可以使用bison
。
GLR解析器還可以解決第二個沖突,這也不是模棱兩可的結果。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.