解决开关块中默认标签的移位/减少冲突

Question

I'm writing a parser for Unrealscript using PLY, and I've run into (hopefully) one of the last ambiguities in the parsing rules that I've set up.我正在使用 PLY 为 Unrealscript 编写解析器，并且我遇到了（希望如此）我设置的解析规则中的最后一个歧义之一。

Unrealscript has a keyword, default , which is used differently depending on the context. Unrealscript 有一个关键字default ，根据上下文使用不同的关键字。 In a regular statement line, you could use default like so:在常规语句行中，您可以像这样使用default ：

default.SomeValue = 3;  // sets the "default" class property to 3

There is also, of course, the default case label for switch statements:当然，还有switch语句的default case 标签：

switch (i) {
    case 0:
        break;
    default:
        break;
}

There is an ambiguity in the parsing when it encounters the default label within what it thinks is the case 's statement block.当它在它认为是case的语句块中遇到default标签时，解析中存在歧义。 Here is an example file that runs into a parsing error:这是一个遇到解析错误的示例文件：

Input输入

class Example extends Object;

function Test() {
    switch (A) {
        case 0:
            default.SomeValue = 3;    // OK
        default:                      // ERROR: Expected "default.IDENTIFIER"
            break;
    }
}

Parsing Rules解析规则

Here are the rules that are in conflict:以下是冲突的规则：

All of the rules can be seen in their entirety on GitHub .所有规则都可以在 GitHub 上完整地看到。

`default`

def p_default(p):
    'default : DEFAULT PERIOD identifier'
    p[0] = ('default', p[3])

`switch`

def p_switch_statement(p):
    'switch_statement : SWITCH LPAREN expression RPAREN LCURLY switch_cases RCURLY'
    p[0] = ('switch_statement', p[3], p[6])


def p_switch_case_1(p):
    'switch_case : CASE primary COLON statements_or_empty'
    p[0] = ('switch_case', p[2], p[4])


def p_switch_case_2(p):
    'switch_case : DEFAULT COLON statements_or_empty'
    p[0] = ('default_case', p[3])


def p_switch_cases_1(p):
    'switch_cases : switch_cases switch_case'
    p[0] = p[1] + [p[2]]


def p_switch_cases_2(p):
    'switch_cases : switch_case'
    p[0] = [p[1]]


def p_switch_cases_or_empty(p):
    '''switch_cases_or_empty : switch_cases
                             | empty'''
    p[0] = p[1]

Any help or guidance on how to resolve this conflict will be greatly appreciated!任何有关如何解决此冲突的帮助或指导将不胜感激！ Thank you in advance.先感谢您。

Answer 1

What you have here is a simple shift/reduce conflict (with the token default as lookahead) being resolved as a shift.您在此处拥有的是一个简单的 shift/reduce 冲突（令牌default为先行）作为 shift 解决。

Let's reduce this all to a much smaller, if not minimal example.让我们把这一切都减少到一个更小的例子，如果不是最小的例子。 Here's the grammar, partly based on the one in the Github repository pointed to in the OP (but intended to be self-contained):这是语法，部分基于 OP 中指向的 Github 存储库中的语法（但旨在自包含）：

statements: statements statement |
statement : assign SEMICOLON
          | switch
assign    : lvalue EQUALS expression
switch    : SWITCH LPAREN expression RPAREN LCURLY cases RCURLY
cases     : cases case | 
case      : CASE expression COLON statements
          | DEFAULT COLON statements
expression: ID | INT
lvalue    : ID | DEFAULT

The key here is that a statement might start with the token DEFAULT , and a case might also start with the token DEFAULT .这里的关键是statement可能以令牌DEFAULT开头， case也可能以令牌DEFAULT开头。 Now, suppose we've reached the following point in the parse:现在，假设我们在解析中达到了以下点：

switch ( <expression> ) { <cases> case <expression> : <statements>

so we're in the middle of a switch compound statement;所以我们处于switch复合语句的中间； we've seen case 0: and we're working on a list of statements.我们已经看到了case 0:并且我们正在处理一个语句列表。 The current state includes the items (there are a few more; I only include the relevant ones):当前状态包括项目（还有一些；我只包括相关的）：

1. statements: statements · statement
2. case      : CASE expression COLON statements ·
3. statement : · assign SEMICOLON
4. assign    : · lvalue EQUALS expression
5. lvalue    : · DEFAULT

The lookahead set for item 2 is [ RCURLY, DEFAULT, ID ] .项目 2 的先行设置是[ RCURLY, DEFAULT, ID ] 。

Now suppose the next token is default .现在假设下一个标记是default 。 We could be looking at the start of a statement, if the default is followed by = .如果默认后跟= ，我们可以查看语句的开头。 Or we could be looking at a new case clause, if the default is followed by : .或者我们可以查看一个新的 case 子句，如果默认后跟: 。 But we can't see two tokens into the future, only one;但是我们看不到未来有两个代币，只有一个； the next token is default and that is all we know.下一个令牌是默认的，这就是我们所知道的。

But we need to make a decision:但我们需要做出决定：

If the default is the beginning of a statement, we can just shift it (item 5).如果默认是语句的开头，我们可以直接移动它（第 5 项）。 Then when we see the = , we'll reduce default to lvalue and continue to parse the assign .然后当我们看到= 时，我们会将默认lvalue减少为lvalue并继续解析assign 。
If the default is the beginning of a case, we need to reduce CASE expression COLON statements to case (item 2).如果默认是 case 的开头，我们需要将CASE expression COLON statements简化为case （第 2 项）。 We will then reduce cases case to cases before we finally shift the default .然后，我们将减少cases case ，以cases之前，我们终于转移默认。 We will then shift the : and continue with DEFAULT COLON statements .然后我们将移动:并继续使用DEFAULT COLON statements 。

Like most LR parser generators, PLY resolves shift/reduce conflicts in favour of the shift, so it will always take the first of the two options above.像大多数 LR 解析器生成器一样，PLY 解决了 shift/reduce 冲突以支持 shift，因此它总是采用上述两个选项中的第一个。 If it then sees : instead of = , it will report a syntax error.如果它看到:而不是= ，它会报告一个语法错误。

So what we have is just another example of an LR(2) grammar.所以我们所拥有的只是 LR(2) 文法的另一个例子。 LR(2) grammars can always be rewritten as LR(1) grammars, but the rewritten grammar is often ugly and bloated. LR(2) 文法总是可以改写为 LR(1) 文法，但改写的文法往往丑陋而臃肿。 Here is one possible solution, which is possibly less ugly than most.这是一种可能的解决方案，它可能不如大多数解决方案难看。

The body of a switch, using EBNF operators |开关的主体，使用 EBNF 运算符| , * and + (alternation, optional repetition, and repetition) is: 、 *和+ （交替、可选重复和重复）是：

switch-body -> (("case" expression | "default") ":" statement*)+

Or, to make it a little less cumbersome:或者，为了让它不那么麻烦：

case-header -> ("case" expression | "default") ":"
switch-body -> (case-header statement*)+

From the perspective of accepted strings, that's exactly the same as从接受字符串的角度来看，这与

switch-body -> case-header (case-header | statement)*

In other words, a sequence of things which are either case-header s or statement s, where the first one is a case-header .换句话说，一系列事物要么是case-header s 要么是statement s，其中第一个是case-header 。

This way of writing the rule does not generate the correct parse tree;这种编写规则的方式不会生成正确的解析树； it simplifies the structure of a switch statement into a soup of statements and case labels.它将 switch 语句的结构简化为一系列语句和 case 标签。 But it does recognise exactly the same language.但它确实识别完全相同的语言。

On the plus side, it has the virtue of not forcing the parser to decide when a case cause has terminated.从好的方面来说，它具有不强制解析器决定案例原因何时终止的优点。 (The grammar no longer has case clauses.) So it is a simple LR(1) grammar: （文法不再有 case 子句。）所以它是一个简单的 LR(1) 文法：

switch       : SWITCH LPAREN expression RPAREN LCURLY switch_body RCURLY
switch_body  : case_header
             | switch_body statement
             | switch_body case_header
case_header  : CASE expr COLON
             | DEFAULT COLON

Now, we could make the argument that the resulting parse tree is, in fact, accurate.现在，我们可以论证生成的解析树实际上是准确的。 Unrealscript shares the same design decision about switch statements as C, in which a case clause does not actually define a block in any real sense of the word. Unrealscript 与 C 共享关于switch语句的相同设计决策，其中case子句实际上并没有定义任何真正意义上的块。 It is simply a label which can be jumped to, and a conditional jump to the next label.它只是一个可以跳转到的标签，并有条件地跳转到下一个标签。

But it is actually not particularly complicated to fix the parse tree as we go, because each reduction to switch_body clearly indicates what we're adding.但实际上在我们进行时修复解析树并不是特别复杂，因为对switch_body每次减少switch_body清楚地表明了我们要添加的内容。 If we're adding a case-header, we can append a new list to the accumulating list of case clauses;如果我们要添加一个 case-header，我们可以将一个新列表附加到 case 子句的累积列表中； if it's a statement, we append the statement to the end of the last case clause.如果它是一个语句，我们将该语句附加到最后一个 case 子句的末尾。

So we could write the above rules in PLY roughly as follows:所以我们可以在 PLY 中将上述规则大致写成如下：

def p_switch_body_1(p):
    ''' switch_body  : case_header '''
    p[0] = [p[1]]

def p_switch_body_2(p):
    ''' switch_body  : switch_body statement '''
    # Append the statement to the list which is the last element of
    # the tuple which is the last element of the list which is the
    # semantic value of symbol 1, the switch_body.
    p[1][-1][-1].append(p[2])
    p[0] = p[1]

def p_switch_body_3(p):
    ''' switch_body  : switch_body case_header '''
    # Add the new case header (symbol 2), whose statement list
    # is initially empty, to the list of switch clauses.
    p[1].append(p[2])
    p[0] = p[1]

def p_case_header_1(p):
    ''' case_header  : CASE expr COLON '''
    p[0] = ('switch_case', p[2], [])

def p_case_header_2(p):
    ''' case_header  : DEFAULT COLON '''
    p[0] = ('default_case', [])

解决开关块中默认标签的移位/减少冲突

问题描述

Input输入

Parsing Rules解析规则

`default`

`switch`

1 个解决方案

解决方案1
2 已采纳 2016-10-01 22:10:35

解决开关块中默认标签的移位/减少冲突

问题描述

Input输入

Parsing Rules解析规则

default

switch

1 个解决方案

解决方案1 2 已采纳 2016-10-01 22:10:35

`default`

`switch`

解决方案1
2 已采纳 2016-10-01 22:10:35