简体   繁体   English

LR、SLR 和 LALR 解析器之间有什么区别?

[英]What is the difference between LR, SLR, and LALR parsers?

What is the actual difference between LR, SLR, and LALR parsers? LR、SLR 和 LALR 解析器之间的实际区别是什么? I know that SLR and LALR are types of LR parsers, but what is the actual difference as far as their parsing tables are concerned?我知道 SLR 和 LALR 是 LR 解析器的类型,但是就它们的解析表而言,实际的区别是什么?

And how to show whether a grammar is LR, SLR, or LALR?以及如何显示一个文法是 LR、SLR 还是 LALR? For an LL grammar we just have to show that any cell of the parsing table should not contain multiple production rules.对于 LL 语法,我们只需要证明解析表的任何单元格都不应包含多个产生式规则。 Any similar rules for LALR, SLR, and LR? LALR、SLR 和 LR 是否有类似的规则?

For example, how can we show that the grammar例如,我们如何证明语法

S --> Aa | bAc | dc | bda
A --> d

is LALR(1) but not SLR(1)?是 LALR(1) 而不是 SLR(1)?


EDIT (ybungalobill) : I didn't get a satisfactory answer for what's the difference between LALR and LR.编辑(ybungalobill) :我没有得到满意的答案,即 LALR 和 LR 之间的区别。 So LALR's tables are smaller in size but it can recognize only a subset of LR grammars.所以 LALR 的表规模较小,但它只能识别 LR 语法的一个子集。 Can someone elaborate more on the difference between LALR and LR please?有人可以详细说明LALR和LR之间的区别吗? LALR(1) and LR(1) will be sufficient for an answer. LALR(1) 和 LR(1) 足以回答。 Both of them use 1 token look-ahead and both are table driven!他们都使用1个令牌先行无一不是表驱动! How they are different?它们有何不同?

SLR, LALR and LR parsers can all be implemented using exactly the same table-driven machinery. SLR,LALR和LR解析器都可以使用完全相同的表驱动机制来实现。

Fundamentally, the parsing algorithm collects the next input token T, and consults the current state S (and associated lookahead, GOTO, and reduction tables) to decide what to do: 从根本上说,解析算法收集下一个输入令牌T,并查询当前状态S(以及相关的前瞻,GOTO和缩减表)以决定做什么:

  • SHIFT: If the current table says to SHIFT on the token T, the pair (S,T) is pushed onto the parse stack, the state is changed according to what the GOTO table says for the current token (eg, GOTO(T)), another input token T' is fetched, and the process repeats SHIFT:如果当前表在令牌T上表示SHIFT,则对(S,T)被推到解析堆栈上,状态根据GOTO表对当前令牌的描述而改变(例如,GOTO(T) ),获取另一个输入令牌T',并重复该过程
  • REDUCE: Every state has 0, 1, or many possible reductions that might occur in the state. 减少:每个州都有可能在该州发生的0,1或多次减少。 If the parser is LR or LALR, the token is checked against lookahead sets for all valid reductions for the state. 如果解析器是LR或LALR,则针对状态的所有有效减少,针对先行集检查令牌。 If the token matches a lookahead set for a reduction for grammar rule G = R1 R2 .. Rn, a stack reduction and shift occurs: the semantic action for G is called, the stack is popped n (from Rn) times, the pair (S,G) is pushed onto the stack, the new state S' is set to GOTO(G), and the cycle repeats with the same token T. If the parser is an SLR parser, there is at most one reduction rule for the state and so the reduction action can be done blindly without searching to see which reduction applies. 如果令牌匹配为语法规则G = R1 R2 ... Rn的缩减而设置的前瞻,则发生堆栈缩减和移位:调用G的语义动作,堆栈弹出n(从Rn)次,该对( S,G)被推入堆栈,新状态S'被设置为GOTO(G),并且循环以相同的令牌T重复。如果解析器是SLR解析器,则最多有一个减少规则状态,因此可以盲目地进行减少动作,而无需搜索哪个减少适用。 It is useful for an SLR parser to know if there is a reduction or not; 这是单反解析器很有必要知道,如果一个减少或没有; this is easy to tell if each state explicitly records the number of reductions associated with it, and that count is needed for the L(AL)R versions in practice anyway. 这很容易判断每个州是否明确记录了与之相关的减少数量,并且无论如何L(AL)R版本都需要这个数量。
  • ERROR: If neither SHIFT nor REDUCE is possible, a syntax error is declared. 错误:如果既没有SHIFT也没有REDUCE,则声明语法错误。

So, if they all the use the same machinery, what's the point? 那么,如果他们都使用相同的机器,那有什么意义呢?

The purported value in SLR is its simplicity in implementation; SLR中声称的价值在于其实施的简单性; you don't have to scan through the possible reductions checking lookahead sets because there is at most one, and this is the only viable action if there are no SHIFT exits from the state. 您不必扫描可能的减少检查前瞻集,因为最多只有一个,如果没有SHIFT退出状态,这是唯一可行的操作。 Which reduction applies can be attached specifically to the state, so the SLR parsing machinery doesn't have to hunt for it. 哪种减少适用于特定于状态,因此SLR解析机器不必寻找它。 In practice L(AL)R parsers handle a usefully larger set of langauges, and is so little extra work to implement that nobody implements SLR except as an academic exercise. 在实践中,L(AL)R解析器处理一个有用的更大的语言集,并且除了作为学术练习之外,没有人实现SLR这么少的额外工作。

The difference between LALR and LR has to do with the table generator . LALR和LR之间的区别与表生成器有关 LR parser generators keep track of all possible reductions from specific states and their precise lookahead set; LR解析器生成器跟踪特定状态及其精确前瞻集的所有可能的减少; you end up with states in which every reduction is associated with its exact lookahead set from its left context. 你最终得到的状态是每个减少与其左上下文中的精确前瞻集相关联。 This tends to build rather large sets of states. 这往往会构建相当大的状态集。 LALR parser generators are willing to combine states if the GOTO tables and lookhead sets for reductions are compatible and don't conflict; 如果用于缩减的GOTO表和lookhead集兼容且不冲突,则LALR解析器生成器愿意组合状态; this produces considerably smaller numbers of states, at the price of not be able to distinguish certain symbol sequences that LR can distinguish. 这会产生相当少数量的状态,代价是无法区分LR可以区分的某些符号序列。 So, LR parsers can parse a larger set of languages than LALR parsers, but have very much bigger parser tables. 因此,LR解析器可以解析比LALR解析器更大的语言集,但是具有更大的解析器表。 In practice, one can find LALR grammars which are close enough to the target langauges that the size of the state machine is worth optimizing; 在实践中,人们可以找到足够接近目标语言的LALR语法,状态机的大小值得优化; the places where the LR parser would be better is handled by ad hoc checking outside the parser. LR解析器更好的地方通过解析器外部的临时检查来处理。

So: All three use the same machinery. 所以:这三个都使用相同的机器。 SLR is "easy" in the sense that you can ignore a tiny bit of the machinery but it is just not worth the trouble. SLR是“简单的”,因为你可以忽略一小部分机器,但它不值得麻烦。 LR parses a broader set of langauges but the state tables tend to be pretty big. LR解析了更广泛的语言,但状态表往往相当大。 That leaves LALR as the practical choice. 这使得LALR成为实用的选择。

Having said all this, it is worth knowing that GLR parsers can parse any context free language, using more complicated machinery but exactly the same tables (including the smaller version used by LALR). 说完这一切之后,值得了解GLR解析器可以解析任何上下文无关语言,使用更复杂的机器但完全相同的表 (包括LALR使用的较小版本)。 This means that GLR is strictly more powerful than LR, LALR and SLR; 这意味着GLR比LR,LALR和SLR更强大; pretty much if you can write a standard BNF grammar, GLR will parse according to it. 几乎如果你能写一个标准的BNF语法,GLR会根据它进行解析。 The difference in the machinery is that GLR is willing to try multiple parses when there are conflicts between the GOTO table and or lookahead sets. 机器的不同之处在于,当GOTO表和/或超前集之间存在冲突时,GLR愿意尝试多次解析。 (How GLR does this efficiently is sheer genius [not mine] but won't fit in this SO post). (GLR如何有效地做到这一点纯粹是天才[不是我的]但不适合这篇SO帖子)。

That for me is an enormously useful fact. 对我来说这是一个非常有用的事实。 I build program analyzers and code transformers and parsers are necessary but "uninteresting"; 我构建程序分析器和代码转换器和解析器是必要的,但“无趣”; the interesting work is what you do with the parsed result and so the focus is on doing the post-parsing work. 有趣的工作是你用解析的结果做的事情,因此重点是做后解析工作。 Using GLR means I can relatively easily build working grammars, compared to hacking a grammar to get into LALR usable form. 使用GLR意味着我可以相对容易地构建工作语法,而不是使用语法来进入LALR可用形式。 This matters a lot when trying to deal to non-academic langauges such as C++ or Fortran, where you literally needs thousands of rules to handle the entire language well, and you don't want to spend your life trying to hack the grammar rules to meet the limitations of LALR (or even LR). 在尝试处理像C ++或Fortran这样的非学术语言时,这很重要,因为你需要成千上万的规则才能很好地处理整个语言,并且你不想花费你的生命来试图破解语法规则满足LALR(甚至LR)的限制。

As a sort of famous example, C++ is considered to be extremely hard to parse... by guys doing LALR parsing. 作为一个着名的例子,C ++被认为是非常难以解析...由做LALR解析的人。 C++ is straightforward to parse using GLR machinery using pretty much the rules provided in the back of the C++ reference manual. 使用C ++参考手册后面提供的规则,使用GLR机制解析C ++很简单。 (I have precisely such a parser, and it handles not only vanilla C++, but also a variety of vendor dialects as well. This is only possible in practice because we are using a GLR parser, IMHO). (我有一个这样的解析器,它不仅处理vanilla C ++,而且还处理各种供应商方言。这只能在实践中实现,因为我们使用的是GLR解析器,恕我直言)。

[EDIT November 2011: We've extended our parser to handle all of C++11. [编辑2011年11月:我们扩展了解析器以处理所有C ++ 11。 GLR made that a lot easier to do. GLR让事情变得更容易。 EDIT Aug 2014: Now handling all of C++17. 编辑2014年8月:现在处理所有的C ++ 17。 Nothing broke or got worse, GLR is still the cat's meow.] 没有什么破坏或变得更糟,GLR仍然是猫的喵喵。

LALR parsers merge similar states within an LR grammar to produce parser state tables that are exactly the same size as the equivalent SLR grammar, which are usually an order of magnitude smaller than pure LR parsing tables. LALR解析器合并LR语法中的类似状态以生成与等效SLR语法完全相同的解析器状态表,其通常比纯LR解析表小一个数量级。 However, for LR grammars that are too complex to be LALR, these merged states result in parser conflicts, or produce a parser that does not fully recognize the original LR grammar. 但是,对于过于复杂而不能成为LALR的LR语法,这些合并状态会导致解析器冲突,或者生成不完全识别原始LR语法的解析器。

BTW, I mention a few things about this in my MLR(k) parsing table algorithm here . 顺便说一下,我在这里的 MLR(k)解析表算法中提到了一些相关的东西。

Addendum 附录

The short answer is that the LALR parsing tables are smaller, but the parser machinery is the same. 简短的回答是LALR解析表较小,但解析器机制是相同的。 A given LALR grammar will produce much larger parsing tables if all of the LR states are generated, with a lot of redundant (near-identical) states. 如果生成所有LR状态,则给定的LALR语法将产生更大的解析表,具有许多冗余(接近相同)状态。

The LALR tables are smaller because the similar (redundant) states are merged together, effectively throwing away context/lookahead info that the separate states encode. LALR表较小,因为类似(冗余)状态被合并在一起,有效地丢弃了单独状态编码的上下文/先行信息。 The advantage is that you get much smaller parsing tables for the same grammar. 优点是您可以为相同的语法获得更小的解析表。

The drawback is that not all LR grammars can be encoded as LALR tables because more complex grammars have more complicated lookaheads, resulting in two or more states instead of a single merged state. 缺点是并非所有LR语法都可以编码为LALR表,因为更复杂的语法具有更复杂的前瞻,导致两个或更多个状态而不是单个合并状态。

The main difference is that the algorithm to produce LR tables carries more info around between the transitions from state to state while the LALR algorithm does not. 主要区别在于,生成LR表的算法在状态到状态的转换之间传递更多信息,而LALR算法则没有。 So the LALR algorithm cannot tell if a given merged state should really be left as two or more separate states. 因此,LALR算法无法判断给定的合并状态是否应该真正保留为两个或多个单独的状态。

Yet another answer (YAA). 又一个答案(YAA)。

The parsing algorithms for SLR(1), LALR(1) and LR(1) are identical like Ira Baxter said, SLR(1),LALR(1)和LR(1)的解析算法完全相同,如Ira Baxter所说,
however, the parser tables may be different because of the parser-generation algorithm. 但是,由于解析器生成算法,解析器表可能不同。

An SLR parser generator creates an LR(0) state machine and computes the look-aheads from the grammar (FIRST and FOLLOW sets). SLR解析器生成器创建LR(0)状态机并根据语法(FIRST和FOLLOW集)计算前瞻。 This is a simplified approach and may report conflicts that do not really exist in the LR(0) state machine. 这是一种简化的方法,可能会报告LR(0)状态机中实际不存在的冲突。

An LALR parser generator creates an LR(0) state machine and computes the look-aheads from the LR(0) state machine (via the terminal transitions). LALR解析器生成器创建LR(0)状态机并计算LR(0)状态机的预测(通过终端转换)。 This is a correct approach, but occasionally reports conflicts that would not exist in an LR(1) state machine. 这是一种正确的方法,但偶尔会报告LR(1)状态机中不存在的冲突。

A Canonical LR parser generator computes an LR(1) state machine and the look-aheads are already part of the LR(1) state machine. Canonical LR解析器生成器计算LR(1)状态机,并且前​​瞻已经是LR(1)状态机的一部分。 These parser tables can be very large. 这些解析器表可能非常大。

A Minimal LR parser generator computes an LR(1) state machine, but merges compatible states during the process, and then computes the look-aheads from the minimal LR(1) state machine. 最小LR解析器生成器计算LR(1)状态机,但在该过程期间合并兼容状态,然后计算来自最小LR(1)状态机的预测。 These parser tables are the same size or slightly larger than LALR parser tables, giving the best solution. 这些解析器表与LALR解析器表的大小相同或略大,从而提供最佳解决方案。

LRSTAR 10.0 can generate LALR(1), LR(1), CLR(1) or LR(*) parsers in C++, whatever is needed for your grammar. LRSTAR 10.0可以在C ++中生成LALR(1),LR(1),CLR(1)或LR(*)解析器,无论语法需要什么。 See this diagram which shows the difference among LR parsers. 请参阅此图 ,其中显示了LR解析器之间的差异。

[Full disclosure: LRSTAR is my product] [完全披露:LRSTAR是我的产品]

Suppose a parser without a lookahead is happily parsing strings for your grammar. 假设没有前瞻的解析器很乐意为你的语法解析字符串。

Using your given example it comes across a string dc , what does it do? 使用您给出的示例,它会遇到字符串dc ,它会做什么? Does it reduce it to S , because dc is a valid string produced by this grammar? 它是否将它减少为S ,因为dc是由该语法产生的有效字符串? OR maybe we were trying to parse bdc because even that is an acceptable string? 或者也许我们试图解析bdc因为即使这是一个可接受的字符串?

As humans we know the answer is simple, we just need to remember if we had just parsed b or not. 作为人类,我们知道答案很简单,我们只需要记住我们是否刚刚解析了b But computers are stupid :) 但电脑是愚蠢的:)

Since an SLR(1) parser had the additional power over LR(0) to perform a lookahead, we know that any amounts of lookahead cannot tell us what to do in this case; 由于SLR(1)解析器具有超过LR(0)的额外功率来执行前瞻,我们知道任何数量的前瞻都无法告诉我们在这种情况下该做什么; instead, we need to look back in our past. 相反,我们需要回顾过去。 Thus comes the canonical LR parser to the rescue. 因此,规范的LR解析器来拯救。 It remembers the past context. 它记得过去的背景。

The way it remembers this context is that it disciplines itself, that whenever it will encounter a b , it will start walking on a path towards reading bdc , as one possibility. 它记住这种情况的方式是它自我规范,每当它遇到一个b ,它就会开始走上阅读bdc的道路,作为一种可能性。 So when it sees a d it knows whether it is already walking a path. 所以当它看到一个d它知道它是否已经在走的路径。 Thus a CLR(1) parser can do things an SLR(1) parser cannot! 因此,CLR(1)解析器可以执行SLR(1)解析器无法做到的事情!

But now, since we had to define so many paths, the states of the machine gets very large! 但是现在,因为我们必须定义这么多路径,所以机器的状态变得非常大!

So we merge same looking paths, but as expected it could give rise to problems of confusion. 因此,我们合并相同的路径,但正如预期的那样,它可能会引起混乱的问题。 However, we are willing to take the risk at the cost of reducing the size. 但是,我们愿意以降低规模为代价承担风险。

This is your LALR(1) parser. 这是你的LALR(1)解析器。


Now how to do it algorithmically. 现在该怎么做算法。

When you draw the configuring sets for the above language, you will see a shift-reduce conflict in two states. 当您绘制上述语言的配置集时,您将在两种状态下看到shift-reduce冲突。 To remove them you might want to consider an SLR(1), which takes decisions looking at a follow, but you would observe that it still won't be able to. 要删除它们,您可能需要考虑一个SLR(1),它会根据后续内容做出决定,但您会发现它仍然无法做到。 Thus you would, draw the configuring sets again but this time with a restriction that whenever you calculate the closure, the additional productions being added must have strict follow(s). 因此,您可以再次绘制配置集,但这次有一个限制,即每当您计算闭包时,添加的其他产品必须具有严格的跟随。 Refer any textbook on what should these follow be. 请参阅任何教科书,了解应遵循的内容。

SLR parsers recognize a proper subset of grammars recognizable by LALR(1) parsers, which in turn recognize a proper subset of grammars recognizable by LR(1) parsers. SLR解析器识别LALR(1)解析器可识别的合适的语法子集,后者又识别LR(1)解析器可识别的合适的语法子集。

Each of these is constructed as a state machine, with each state representing some set of the grammar's production rules (and position in each) as it's parsing the input. 这些中的每一个都被构造为一个状态机,每个状态代表一组语法的生成规则(以及每个语法中的位置),因为它正在解析输入。

The Dragon Book example of an LALR(1) grammar that is not SLR is this: 不是SLR的LALR(1)语法的Dragon Book示例如下:

S → L = R | R
L → * R | id
R → L

Here is one of the states for this grammar: 这是这个语法的状态之一:

S → L•= R
R → L•

The indicates the position of the parser in each of the possible productions. 表示解析器在每个可能的制作中的位置。 It doesn't know which of the productions it's actually in until it reaches the end and tries to reduce. 在它到达终点并试图减少之前,它不知道它实际上是哪个产品。

Here, the parser could either shift an = or reduce R → L . 这里,解析器可以移位=或减少R → L

An SLR (aka LR(0)) parser would determine whether it could reduce by checking if the next input symbol is in the follow set of R (ie, the set of all terminals in the grammar that can follow R ). SLR(也称为LR(0))解析器将通过检查下一个输入符号是否在R跟随集合中(即,语法中可以跟随R的所有终端的集合)来确定它是否可以减少。 Since = is also in this set, the SLR parser encounters a shift-reduce conflict. 由于=也在此集合中,因此SLR解析器遇到shift-reduce冲突。

However, an LALR(1) parser would use the set of all terminals that can follow this particular production of R, which is only $ (ie, end of input). 但是,LALR(1)解析器将使用可以跟随R的特定生成的所有终端的集合,其仅为$ (即,输入的结束)。 Thus, no conflict. 因此,没有冲突。

As previous commenters have noted, LALR(1) parsers have the same number of states as SLR parsers. 正如之前的评论者所指出的,LALR(1)解析器与SLR解析器具有相同数量的状态。 A lookahead propagation algorithm is used to tack lookaheads on to SLR state productions from corresponding LR(1) states. 先行传播算法用于对来自相应LR(1)状态的SLR状态产生进行预测。 The resulting LALR(1) parser can introduce reduce-reduce conflicts not present in the LR(1) parser, but it cannot introduce shift-reduce conflicts. 生成的LALR(1)解析器可以引入LR(1)解析器中不存在的reduce-reduce冲突,但是它不能引入shift-reduce冲突。

In your example , the following LALR(1) state causes a shift-reduce conflict in an SLR implementation: 在您的示例中 ,以下LALR(1)状态会导致SLR实现中的shift-reduce冲突:

S → b d•a / $
A → d• / c

The symbol after / is the follow set for each production in the LALR(1) parser. /之后的符号是LALR(1)解析器中每个生产的跟随集。 In SLR, follow( A ) includes a , which could also be shifted. 在SLR中, 跟随( A 包括a ,也可以移位。

The basic difference between the parser tables generated with SLR vs LR, is that reduce actions are based on the Follows set for SLR tables. 使用SLR与LR生成的解析器表之间的基本区别在于,reduce操作基于SLR表的Follows设置。 This can be overly restrictive, ultimately causing a shift-reduce conflict. 这可能过于严格,最终导致转移减少冲突。

An LR parser, on the other hand, bases reduce decisions only on the set of terminals which can actually follow the non-terminal being reduced. 另一方面,LR解析器仅基于可以实际跟随非终端减少的终端组来减少决策。 This set of terminals is often a proper subset of the Follows set of such a non-terminal, and therefore has less chance of conflicting with shift actions. 这组终端通常是这种非终端的跟随集的适当子集,因此与转移动作冲突的可能性较小。

LR parsers are more powerful for this reason. 由于这个原因,LR解析器更强大。 LR parsing tables can be extremely large, however. 但是,LR解析表可能非常大。

An LALR parser starts with the idea of building an LR parsing table, but combines generated states in a way that results in significantly less table size. LALR解析器以构建LR解析表的思想开始,但是以一种导致表大小显着减少的方式组合生成的状态。 The downside is that a small chance of conflicts would be introduced for some grammars that an LR table would otherwise have avoided. 缺点是,对于某些LR表本可以避免的语法,会引入一小部分冲突。

LALR parsers are slightly less powerful than LR parsers, but still more powerful than SLR parsers. LALR解析器的功能略低于LR解析器,但仍然比SLR解析器更强大。 YACC and other such parser generators tend to use LALR for this reason. 由于这个原因,YACC和其他此类解析器生成器倾向于使用LALR。

PS For brevity, SLR, LALR and LR above really mean SLR(1), LALR(1), and LR(1), so one token lookahead is implied. PS为简洁起见,上面的SLR,LALR和LR确实意味着SLR(1),LALR(1)和LR(1),因此暗示了一个令牌前瞻。

One simple answer is that all LR(1) grammars are LALR(1) grammars. 一个简单的答案是所有LR(1)语法都是LALR(1)语法。 Compared to LALR(1), LR(1) has more states in the associated finite-state machine (more than double the states). 与LALR(1)相比,LR(1)在相关的有限状态机中具有更多的状态(超过状态的两倍)。 And that is the main reason LALR(1) grammars require more code to detect syntax errors than LR(1) grammars. 这是LALR(1)语法比LR(1)语法需要更多代码来检测语法错误的主要原因。 And one more important thing to know regarding these two grammars is that in LR(1) grammars we might have less reduce/reduce conflicts. 关于这两个语法的另一个重要的事情是,在LR(1)语法中,我们可能减少/减少冲突。 But in LALR(1) there is more possibility of reduce/reduce conflicts. 但是在LALR(1)中,减少/减少冲突的可能性更大。

In addition to the answers above, this diagram demonstrates how different parsers relate: 除了上面的答案,此图还演示了不同的解析器如何关联:

在此输入图像描述

Adding on top the above answers, the difference in between the class of bottom-up parsers is whether they result in shift/reduce or reduce/reduce conflicts whether generating the parsing tables.加上上面的答案,自下而上的解析器类之间的区别在于它们是否会导致生成解析表的移位/减少或减少/减少冲突。 The less it will have the conflicts, the more powerful will be the grammar.冲突越少,语法就越强大。

For example, consider the following expression grammar:例如,考虑以下表达式语法:

E → E + T E → E + T

E → T E → T

T → F T → F

T → T * F T → T * F

F → ( E ) F → ( E )

F → id F→id

It's not LR(0) but SLR(1).它不是 LR(0) 而是 SLR(1)。 Using the following code, we can construct the LR0 automaton and build the parse table:使用以下代码,我们可以构建 LR0 自动机并构建解析表:

from copy import deepcopy
import pandas as pd

def look_ahead(I, I0, sym, NTs): # read the next symbol
    #I0 = deepcopy(I0)
    I1 = {}
    for NT in I:
        C = {}
        for r in I[NT]:
            r = r.copy()
            ix = r.index('.')
            #if ix == len(r)-1: # reduce step
            if ix >= len(r)-1 or r[ix+1] != sym:
                continue
            r[ix:ix+2] = r[ix:ix+2][::-1]
            C = compute_closure(r, I0, NTs)
            cnt = C.get(NT, [])
            if not r in cnt:
                cnt.append(r)
            C[NT] = cnt
        if len(I1) == 0:
            I1 = C
        else:
            for nt in C:
                Int = I1.get(nt, [])
                for r in C.get(nt, []):
                    if not r in Int:
                        Int.append(r)
                I1[nt] = Int
    return I1

def construct_LR0_automaton(G, NTs, Ts):
    parse_table = {}
    I0 = get_start_state(G, NTs, Ts)
    I = deepcopy(I0)
    q = [0]
    states = {0: I}
    statess = {str(to_str(I)):0}
    trans = {}
    cur = 0
    reduces = {}
    while len(q) > 0:
        id = q.pop(0)
        I = states[id]
        for NT in NTs:
            I1 = look_ahead(I, I0, NT, NTs)
            if len(I1) > 0:
                state = str(to_str(I1))
                if not state in statess:
                    cur += 1
                    q.append(cur)
                    states[cur] = I1
                    statess[state] = cur
                    trans[id, NT] = cur
                else:
                    trans[id, NT] = statess[state]
        # compute lookahead for terminals too
        # ... ... ...
                    
    return states, statess, trans
        
states, statess, trans = construct_LR0_automaton(G, NTs, Ts)

where the grammar G, non-terminal and terminal symbols are defined as below其中语法 G、非终结符和终结符定义如下

G = {}
NTs = ['E', 'T', 'F']
Ts = {'+', '*', '(', ')', 'id'}
G['E'] = [['E', '+', 'T'], ['T']]
G['T'] = [['T', '*', 'F'], ['F']]
G['F'] = [['(', 'E', ')'], ['id']]

Here are few more useful function I implemented along with the above ones for LR(0) parsing table generation:以下是我为 LR(0) 解析表生成实现的一些更有用的函数:

def augment(G, S):
    G[S + '1'] = [[S, '$']]
    NTs.append(S + '1')
    return G, NTs

def compute_closure(r, G, NTs):
    S = {}
    q = [r]
    seen = []
    while len(q) > 0:
        r = q.pop(0)
        seen.append(r)
        ix = r.index('.') + 1
        if ix < len(r) and r[ix] in NTs:
            S[r[ix]] = G[r[ix]]
            for rr in G[r[ix]]:
                if not rr in seen:
                    q.append(rr)
    return S

The following figure (expand it to view) shows the LR0 DFA constructed for the grammar using the above code:下图(展开查看)显示了使用上述代码为语法构造的LR0 DFA:

在此处输入图片说明

The following table shows the LR0 parsing table generated as a pandas dataframe, notice that there are couple of shift/reduce conflicts, indicating that the grammar is not LR(0).下表显示了作为 Pandas 数据帧生成的 LR0 解析表,注意有几个 shift/reduce 冲突,表明语法不是 LR(0)。

在此处输入图片说明

SLR(1) parser avoids the above shift / reduce conflicts by reducing only if the next input token is a member of the Follow Set of the nonterminal being reduced. SLR(1) 解析器仅在下一个输入标记是被归约的非终结符的 Follow Set 的成员时才进行归约,从而避免了上述移位/归约冲突。

The grammar from the question is not LR(0) as well:问题中的语法也不是 LR(0):

#S --> Aa | bAc | dc | bda
#A --> d    
G = {}
NTs = ['S', 'A']
Ts = {'a', 'b', 'c', 'd'}
G['S'] = [['A', 'a'], ['b', 'A', 'c'], ['d', 'c'], ['b', 'd', 'a']]
G['A'] = [['d']]

as can be seen from the next LR0 DFA and the parsing table:从下一个LR0 DFA和解析表可以看出:

在此处输入图片说明

there is a shift / reduce conflict again:再次发生转变/减少冲突:

在此处输入图片说明

But the following grammar is LR(0):但是下面的语法是LR(0):

A → a A b A → a A b

A → c A → C

S → A S → A

# S --> A 
# A --> a A b | c
G = {}
NTs = ['S', 'A']
Ts = {'a', 'c'}
G['S'] = [['A']]
G['A'] = [['a', 'A', 'b'], ['c']]

在此处输入图片说明

As can be seen from the following figure, there is no conflict in the parsing table generated.从下图可以看出,生成的解析表没有冲突。

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM