简体   繁体   English

LR(0) 和 SLR 解析有什么区别?

[英]What is the difference between LR(0) and SLR parsing?

I am working on my compilers concepts however I am a little confused... Googling got me nowhere to a definite answer.我正在研究我的编译器概念,但我有点困惑......谷歌搜索让我无处可寻。

Is SLR and LR(0) parsers one and same? SLR 和 LR(0) 解析器是一回事吗? If not, whats the difference?如果不是,有什么区别?

Both LR(0) and SLR(1) parsers are bottom-up, directional, predictive parsers . LR(0) 和 SLR(1) 解析器都是自底向上的、定向的、预测的解析器 This means that这意味着

  • The parsers attempt to apply productions in reverse to reduce the input sentence back to the start symbol ( bottom-up )解析器尝试反向应用产生式以将输入句子减少回开始符号(自下而上
  • The parsers scan the input from left-to-right ( directional )解析器从左到右(方向)扫描输入
  • The parsers attempt to predict what reductions to apply without necessarily seeing all of the input ( predictive )解析器尝试预测要应用的缩减量,而不必看到所有输入(预测

Both LR(0) and SLR(1) are shift/reduce parsers , meaning that they process the tokens of the input stream by placing them on a stack, and at each point either shifting a token by pushing it onto the stack or reducing some sequence of terminals and nonterminals atop the stack back to some nonterminal symbol. LR(0) 和 SLR(1) 都是shift/reduce 解析器,这意味着它们通过将输入流的标记放在堆栈上来处理它们,并且在每个点通过将其压入堆栈或减少一些来移动标记堆栈顶部的终结符和非终结符序列回到某个非终结符符号。 It can be shown that any grammar can be parsed bottom-up using a shift/reduce parser, but that parser might not be deterministic .可以证明任何语法都可以使用 shift/reduce 解析器自底向上解析,但该解析器可能不是确定性的 That is, the parser may have to "guess" whether to apply a shift or reduction, and may end up having to backtrack to realize that it made the wrong choice.也就是说,解析器可能不得不“猜测”是应用移位还是归约,并且可能最终不得不回溯以意识到它做出了错误的选择。 No matter how powerful a deterministic shift/reduce parser you construct, it will never be able to parse all grammars.无论您构建的确定性移位/归约解析器多么强大,它都永远无法解析所有语法。

When a deterministic shift/reduce parser is used to parse a grammar that it cannot handle, it results in shift/reduce conflicts or reduce/reduce conflicts , where the parser may enter a state in which it cannot tell what action to take.当确定性 shift/reduce 解析器用于解析它无法处理的语法时,它会导致shift/reduce 冲突reduce/reduce 冲突,其中解析器可能会进入一种状态,在这种状态下它无法确定要采取什么操作。 In a shift/reduce conflict, it cannot tell whether it should add another symbol to the stack or perform some reduction on the top symbols of the stack.在移位/归约冲突中,它无法判断是否应该向堆栈添加另一个符号或对堆栈的顶部符号执行一些归约。 In a reduce/reduce conflict, the parser knows that it needs to replace the top symbols of the stack with some nonterminal, but it can't tell what reduction to use.在reduce/reduce 冲突中,解析器知道它需要用一些非终结符替换堆栈的顶部符号,但它无法确定使用什么reduce。

I apologize if this is a lengthy exposition, but we need this to be able to address the difference between LR(0) and SLR(1) parsing.如果这是一个冗长的说明,我深表歉意,但我们需要它来解决 LR(0) 和 SLR(1) 解析之间的差异。 An LR(0) parser is a shift/reduce parser that uses zero tokens of lookahead to determine what action to take (hence the 0). LR(0) 解析器是一种移位/归约解析器,它使用前瞻的零标记来确定要采取的操作(因此为 0)。 This means that in any configuration of the parser, the parser must have an unambiguous action to choose - either it shifts a specific symbol or applies a specific reduction.这意味着在解析器的任何配置中,解析器必须有一个明确的动作可供选择 - 要么移动特定符号,要么应用特定归约。 If there are ever two or more choices to make, the parser fails and we say that the grammar is not LR(0).如果有两个或多个选择,解析器失败,我们说文法不是 LR(0)。

Recall that the two possible LR conflicts are shift/reduce and reduce/reduce.回想一下,两种可能的 LR 冲突是 shift/reduce 和 reduce/reduce。 In both of these cases, there are at least two actions that the LR(0) automaton could be taking, and it can't tell which of them to use.在这两种情况下,LR(0) 自动机至少可以采取两个动作,但它无法分辨要使用哪个动作。 Since at least one of the conflicting actions is a reduction, a reasonable line of attack would be to try to have the parser be more careful about when it performs a particular reduction.由于至少有一个冲突操作是归约,合理的攻击方式是尝试让解析器在执行特定归约时更加小心。 More specifically, let's suppose that the parser is allowed to look at the next token of input to determine whether it should shift or reduce.更具体地说,让我们假设允许解析器查看输入的下一个标记以确定它是应该移位还是减少。 If we only allow the parser to reduce when it "makes sense" to do so (for some definition of "makes sense"), then we may be able to eliminate the conflict by having the automaton specifically choose to either shift or reduce in a particular step.如果我们只允许解析器在它“有意义”时减少它(对于“有意义”的某些定义),那么我们可以通过让自动机专门选择在一个具体步骤。

In SLR(1) ("Simplified LR(1)"), the parser is allowed to look at one token of lookahead when deciding whether it should shift or reduce.在 SLR(1) ("Simplified LR(1)") 中,解析器在决定是否应该移位或减少时可以查看前瞻的一个标记。 In particular, when the parser wants to try reducing something of the form A → w (for nonterminal A and string w), it looks at the next token of input.特别是,当解析器想要尝试减少 A → w 形式的某些东西(对于非终结符 A 和字符串 w)时,它会查看输入的下一个标记。 If that token could legally appear after the nonterminal A in some derivation, the parser reduces.如果该标记可以合法地出现在某些派生中的非终结符 A 之后,解析器就会减少。 Otherwise, it does not.否则,它不会。 The intuition here is that in some cases it makes no sense to attempt a reduction, because given the tokens we've seen so far and the upcoming token, there is no possible way that the reduction could ever be correct.这里的直觉是,在某些情况下,尝试减少是没有意义的,因为鉴于我们目前看到的令牌和即将到来的令牌,减少是不可能正确的。

The only difference between LR(0) and SLR(1) is this extra ability to help decide what action to take when there are conflicts. LR(0) 和 SLR(1) 之间的唯一区别是这种额外的能力,可以帮助决定在发生冲突时采取什么行动。 Because of this, any grammar that can be parsed by an LR(0) parser can be parsed by an SLR(1) parser.因此,任何可以由 LR(0) 解析器解析的语法都可以由 SLR(1) 解析器解析。 However, SLR(1) parsers can parse a larger number of grammars than LR(0).但是,与 LR(0) 相比,SLR(1) 解析器可以解析更多的语法。

In practice, though, SLR(1) is still a fairly weak parsing method.但在实践中,SLR(1) 仍然是一种相当弱的解析方法。 More commonly, you will see LALR(1) ("Lookahead LR(1)") parsers being used.更常见的是,您会看到正在使用 LALR(1) ("Lookahead LR(1)") 解析器。 They too work by trying to resolve conflicts in an LR(0) parser, but the rules they use for resolving conflicts are far more precise than those used in SLR(1), and consequently a much larger number of grammars are LALR(1) than are SLR(1).它们也通过尝试解决 LR(0) 解析器中的冲突来工作,但是它们用于解决冲突的规则比 SLR(1) 中使用的规则要精确得多,因此更多的语法是 LALR(1)比单反(1)。 To be a bit more specific, SLR(1) parsers try to resolve conflicts by looking at the structure of the grammar to learn more information about when to shift and when to reduce.更具体地说,SLR(1) 解析器尝试通过查看语法结构来了解有关何时移位和何时减少的更多信息来解决冲突。 LALR(1) parsers look at both the grammar and the LR(0) parser to get even more specific information about when to shift and when to reduce. LALR(1) 解析器同时查看语法和 LR(0) 解析器,以获取有关何时移位和何时减少的更具体信息。 Because LALR(1) can look at the structure of the LR(0) parser, it can more precisely identify when certain conflicts are spurious.因为 LALR(1) 可以查看 LR(0) 解析器的结构,所以它可以更准确地识别某些冲突何时是虚假的。 The Linux utilities yacc and bison , by default, produce LALR(1) parsers.默认情况下,Linux 实用程序yaccbison生成 LALR(1) 解析器。

Historically, LALR(1) parsers were typically constructed through a different method that relied on the far more powerful LR(1) parser, so you will often see LALR(1) described that way.从历史上看,LALR(1) 解析器通常是通过依赖于更强大的 LR(1) 解析器的不同方法构建的,因此您经常会看到 LALR(1) 是这样描述的。 To understand this, we need to talk about LR(1) parsers.要理解这一点,我们需要谈谈 LR(1) 解析器。 In an LR(0) parser, the parser works by keeping track of where it might be in the middle of a production.在 LR(0) 解析器中,解析器通过跟踪它可能在生产过程中的位置来工作。 Once it has found that it's reached the end of a production, it knows to try to reduce.一旦发现它已经达到生产的终点,它就会知道尝试减少。 However, the parser might not be able to tell whether it's in at the end of one production and the middle of another, which leads to a shift/reduce conflict, or which of two different productions it has reached the end of (a reduce/reduce conflict).但是,解析器可能无法判断它是在一个产生式的末尾还是另一个产生式的中间,这会导致移位/减少冲突,或者它已经到达了两个不同产生式中的哪一个(减少/减少冲突)。 In LR(0), this immediately leads to a conflict and the parser fails.在 LR(0) 中,这会立即导致冲突并且解析器失败。 In SLR(1) or LALR(1), the parser then makes the decision to shift or reduce based on the next token of lookahead.在 SLR(1) 或 LALR(1) 中,解析器然后根据前瞻的下一个标记做出移动或减少的决定。

In an LR(1) parser, the parser keeps track of additional information as it operates.在 LR(1) 解析器中,解析器在运行时会跟踪附加信息。 In addition to keeping track of what production the parser believes is being used, it keeps track of what possible tokens might appear after that production is completed.除了跟踪解析器认为正在使用的产品之外,它还跟踪该产品完成后可能出现的令牌。 Because the parser keeps track of this information at each step, and not just when it needs to make the decision, the LR(1) parser is substantially more powerful and precise than any of the LR(0), SLR(1), or LALR(1) parsers we've talked about so far.因为解析器在每一步都跟踪这些信息,而不仅仅是在需要做出决定的时候,LR(1) 解析器比 LR(0)、SLR(1) 或到目前为止我们已经讨论过的 LALR(1) 解析器。 LR(1) is an extremely powerful parsing technique, and it can be shown using some tricky math that any language that could be parsed deterministically by any shift/reduce parser has some grammar that could be parsed with an LR(1) automaton. LR(1) 是一种非常强大的解析技术,它可以使用一些棘手的数学来证明,任何可以被任何移位/归约解析器确定性解析的语言都有一些可以用 LR(1) 自动机解析的语法。 (Note that this does not mean that all grammars that can be parsed deterministically are LR(1); this only says that a language that could be parsed deterministically has some LR(1) grammar). (请注意,这并不意味着可以确定性地解析的所有语法都是 LR(1);这仅说明可以确定性地解析的语言具有某些 LR(1) 语法)。 However, this power comes at a price, and a generated LR(1) parser may require so much information to operate that it can't possibly be used in practice.然而,这种能力是有代价的,生成的 LR(1) 解析器可能需要太多的信息来运行,以至于它不可能在实践中使用。 An LR(1) parser for a real programming language, for example, might require tens to hundreds of megabytes of additional information to operate correctly.例如,用于实际编程语言的 LR(1) 解析器可能需要数十到数百兆字节的附加信息才能正确运行。 For this reason, LR(1) isn't typically used in practice, and weaker parsers like LALR(1) or SLR(1) are used instead.出于这个原因,LR(1) 在实践中通常不使用,而是使用较弱的解析器,如 LALR(1) 或 SLR(1)。

More recently, a new parsing algorithm called GLR(0) ("Generalized LR(0)") has gained popularity.最近,一种称为 GLR(0)(“Generalized LR(0)”)的新解析算法开始流行。 Rather than trying to resolve the conflicts that appear in an LR(0) parser, the GLR(0) parser instead works by trying all possible options in parallel. GLR(0) 解析器不是尝试解决出现在 LR(0) 解析器中的冲突,而是通过并行尝试所有可能的选项来工作。 Using some clever tricks, this can be made to run very efficiently for many grammars.使用一些巧妙的技巧,可以使许多语法非常有效地运行。 Moreover, GLR(0) can parse any context-free grammar at all , even grammars that can't be parsed by an LR(k) parser for any k.此外,GLR(0) 可以解析任何上下文无关文法,即使是 LR(k) 解析器无法解析任何 k 的文法。 Other parsers are capable of doing this as well (for example, the Earley parser or a CYK parser), though GLR(0) tends to be faster in practice.其他解析器也可以这样做(例如,Earley 解析器或 CYK 解析器),尽管 GLR(0) 在实践中往往更快。

If you're interested in learning more, over this summer I taught an introductory compilers course and spent just under two weeks talking about parsing techniques.如果您有兴趣了解更多信息,今年夏天我教授了一门介绍性的编译器课程,并花了不到两周的时间谈论解析技术。 If you'd like to get a more rigorous introduction to LR(0), SLR(1), and a host of other powerful parsing techniques, you might enjoy my lecture slides and homework assignments about parsing.如果您想对 LR(0)、SLR(1) 和许多其他强大的解析技术进行更严格的介绍,您可能会喜欢我关于解析的讲座幻灯片和家庭作业。 All of the course materials are available here on my personal site .所有课程材料都可以在我的个人网站上找到

Hope this helps!希望这可以帮助!

This is what I have learnt .这是我学到的。 Usually LR(0) parser can have ambiguity, ie one box of the table (you derive for creating the parser) can have multiple values (or) to better put it : the parser leads to two final states with the same input.通常 LR(0) 解析器可能有歧义,即表的一个框(您为创建解析器而派生)可以有多个值(或)更好地表达它:解析器导致具有相同输入的两个最终状态。 So SLR parser is created to remove this ambiguity.所以创建了 SLR 解析器来消除这种歧义。 Inorder to construct it find all the productions which lead to goto states , find the follow for the production symbol on the left hand side and only include those goto states which are present in the follow .为了构建它,找到导致 goto 状态的所有产生式,在左侧找到产生式符号的 follow 并且只包括那些出现在 follow 中的 goto 状态。 This inturn means that you dont include a production which is not possible using the original grammer(coz that state is not in the follow set)这反过来意味着您不包含使用原始语法无法实现的产品(因为该状态不在后续集合中)

In the parsing table for LR(0) , the reduce rule for the production is placed in the entire row, across all the terminals whereas in SLR Parsing table the reduce rule for the production is placed only in the Follow set of left hand side Non-terminal of the reduce production.在 LR(0) 的解析表中,产生式的缩减规则放置在整行中,跨越所有终端,而在 SLR 解析表中,产生式的缩减规则仅放置在左侧 Non 的 Follow 集合中-减少生产的终端。

The tool called parsing-EMU is very helpful in parsing and can generate first, follow, LR(0) itemset, LALR Evaluation etc. You can find it here .名为 parsing-EMU 的工具对解析非常有帮助,可以生成 first、follow、LR(0) 项集、LALR Evaluation 等。你可以在这里找到它。

Adding on top of the above answers, the difference in between the individual parsers in the class of bottom-up parsers is whether they result in shift/reduce or reduce/reduce conflicts when generating the parsing tables.除了上述答案之外,自底向上解析器类中的各个解析器之间的区别在于它们在生成解析表时是否会导致 shift/reduce 或 reduce/reduce 冲突。 The less it will have the conflicts, the more powerful will be the grammar (LR(0) < SLR(1) < LALR(1) < CLR(1)).冲突越少,语法就越强大(LR(0) < SLR(1) < LALR(1) < CLR(1))。

For example, consider the following expression grammar:例如,考虑以下表达式语法:

E → E + T E → E + T

E → T E → T

T → F T → F

T → T * F T → T * F

F → ( E ) F → ( E )

F → id F→id

It's not LR(0) but SLR(1).它不是 LR(0) 而是 SLR(1)。 Using the following code, we can construct the LR0 automaton and build the parsing table (we need to augment the grammar, compute the DFA with closure, compute the action and goto sets):使用以下代码,我们可以构建 LR0 自动机并构建解析表(我们需要扩充语法、使用闭包计算 DFA、计算动作和转到集):

from copy import deepcopy
import pandas as pd

def update_items(I, C):
    if len(I) == 0:
         return C
    for nt in C:
         Int = I.get(nt, [])
         for r in C.get(nt, []):
              if not r in Int:
                  Int.append(r)
          I[nt] = Int
     return I

def compute_action_goto(I, I0, sym, NTs): 
    #I0 = deepcopy(I0)
    I1 = {}
    for NT in I:
        C = {}
        for r in I[NT]:
            r = r.copy()
            ix = r.index('.')
            #if ix == len(r)-1: # reduce step
            if ix >= len(r)-1 or r[ix+1] != sym:
                continue
            r[ix:ix+2] = r[ix:ix+2][::-1]    # read the next symbol sym
            C = compute_closure(r, I0, NTs)
            cnt = C.get(NT, [])
            if not r in cnt:
                cnt.append(r)
            C[NT] = cnt
        I1 = update_items(I1, C)
    return I1

def construct_LR0_automaton(G, NTs, Ts):
    I0 = get_start_state(G, NTs, Ts)
    I = deepcopy(I0)
    queue = [0]
    states2items = {0: I}
    items2states = {str(to_str(I)):0}
    parse_table = {}
    cur = 0
    while len(queue) > 0:
        id = queue.pop(0)
        I = states[id]
        # compute goto set for non-terminals
        for NT in NTs:
            I1 = compute_action_goto(I, I0, NT, NTs) 
            if len(I1) > 0:
                state = str(to_str(I1))
                if not state in statess:
                    cur += 1
                    queue.append(cur)
                    states2items[cur] = I1
                    items2states[state] = cur
                    parse_table[id, NT] = cur
                else:
                    parse_table[id, NT] = items2states[state]
        # compute actions for terminals similarly
        # ... ... ...
                    
    return states2items, items2states, parse_table
        
states, statess, parse_table = construct_LR0_automaton(G, NTs, Ts)

where the grammar G, non-terminal and terminal symbols are defined as below其中语法 G、非终结符和终结符定义如下

G = {}
NTs = ['E', 'T', 'F']
Ts = {'+', '*', '(', ')', 'id'}
G['E'] = [['E', '+', 'T'], ['T']]
G['T'] = [['T', '*', 'F'], ['F']]
G['F'] = [['(', 'E', ')'], ['id']]

Here are few more useful function I implemented along with the above ones for LR(0) parsing table generation:以下是我为 LR(0) 解析表生成实现的一些更有用的函数:

def augment(G, S): # start symbol S
    G[S + '1'] = [[S, '$']]
    NTs.append(S + '1')
    return G, NTs

def compute_closure(r, G, NTs):
    S = {}
    queue = [r]
    seen = []
    while len(queue) > 0:
        r = queue.pop(0)
        seen.append(r)
        ix = r.index('.') + 1
        if ix < len(r) and r[ix] in NTs:
            S[r[ix]] = G[r[ix]]
            for rr in G[r[ix]]:
                if not rr in seen:
                    queue.append(rr)
    return S

The following figure (expand it to view) shows the LR0 DFA constructed for the grammar using the above code:下图(展开查看)显示了使用上述代码为语法构造的LR0 DFA:

在此处输入图片说明

The following table shows the LR0 parsing table generated as a pandas dataframe, notice that there are couple of shift/reduce conflicts, indicating that the grammar is not LR(0).下表显示了作为 Pandas 数据帧生成的 LR0 解析表,注意有几个 shift/reduce 冲突,表明语法不是 LR(0)。

在此处输入图片说明

SLR(1) parser avoids the above shift / reduce conflicts by reducing only if the next input token is a member of the Follow Set of the nonterminal being reduced. SLR(1) 解析器仅在下一个输入标记是被归约的非终结符的 Follow Set 的成员时才进行归约,从而避免了上述移位/归约冲突。 So the above grammar is not LR(0), but it's SLR(1).所以上面的文法不是LR(0),而是SLR(1)。

But, the following grammar which accepts the strings of the form a^ncb^n, n >= 1 is LR(0):但是,以下接受a^ncb^n, n >= 1形式的字符串的语法是 LR(0):

A → a A b A → a A b

A → c A → C

S → A S → A

Let's define the grammar as follows:让我们定义语法如下:

# S --> A 
# A --> a A b | c
G = {}
NTs = ['S', 'A']
Ts = {'a', 'b', 'c'}
G['S'] = [['A']]
G['A'] = [['a', 'A', 'b'], ['c']]

在此处输入图片说明

As can be seen from the following figure, there is no conflict in the parsing table generated.从下图可以看出,生成的解析表没有冲突。

![在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM