简体繁体 English

LL（1）解析器中FIRST和FOLLOW的目的是什么？

[英]Purpose of FIRST and FOLLOW sets in LL(1) parsers?

原文 2013-12-01 20:58:50 5 1 parsing/ context-free-grammar/ ll

Can anyone explain to me how FIRST and FOLLOW should be used in LL(1) grammar? 任何人都可以向我解释如何在LL（1）语法中使用FIRST和FOLLOW？ I understand that they are used for syntax table construction, but I don't understand how. 我知道它们用于语法表构造，但我不明白如何。

1 个解决方案

In an LL(1) parser, the parser works by maintaining a workspace initially seeded to the start symbol followed by the end-of-string marker (usually denoted $). 在LL（1）解析器中，解析器通过维护最初接种到起始符号的工作空间，然后是字符串结束标记（通常表示为$）来工作。 At each step, it does one of the following: 在每个步骤中，它执行以下操作之一：

If the first symbol of the workspace is a terminal, it matches it against the next token of input (or reports an error if it doesn't match.) 如果工作空间的第一个符号是终端，则它将它与下一个输入标记匹配（或者如果它不匹配则报告错误。）
If the first symbol of the workspace is a nonterminal, it predicts what production to replace that nonterminal with. 如果工作空间的第一个符号是非终结符号，它会预测用非终结符号替换哪个生成。

The predict step is where FIRST and FOLLOW show up. 预测步骤是FIRST和FOLLOW出现的地方。 The parser needs to be able to guess, based purely on the current nonterminal and the next token of input, which production to use. 解析器需要能够完全基于当前的非终结符和输入的下一个标记来猜测要使用的生产。 The question is how to do this. 问题是如何做到这一点。

Let's suppose that the current nonterminal is A and the next token of input is t. 假设当前的非终结符号为A，输入的下一个标记为t。 If you know the productions of A, which one would you choose to apply? 如果你知道A的制作，你会选择哪一个？ There's one simple case to consider: if there's a production of the form A → tω, where ω is some arbitrary string, then you should pick that production because the t you're looking at as input will match the t at the front of the production. 有一个简单的例子要考虑：如果有一个形式A→tω的产生，其中ω是一些任意的字符串，那么你应该选择那个产量，因为你看作输入的t将匹配前面的t生产。

There are also some complex cases to consider. 还有一些复杂的案例需要考虑。 Suppose you have a production of the form A → Bω, where B is a nonterminal and ω is some string. 假设你有A→Bω形式的产生，其中B是非终结符，ω是某个字符串。 Under what circumstances would you want to guess this production? 在什么情况下你想猜这个产品？ Well, if you know that the next terminal symbol is at, you wouldn't want to guess this production unless you knew that B can expand to a string that starts with the terminal t (there's another case that we'll talk about in a second). 好吧，如果你知道下一个终端符号是，你不会想要猜测这个产生，除非你知道B可以扩展到以终端t开头的字符串（还有另一个我们将要讨论的情况）第二）。 This is where FIRST sets come in. In grammars without ε productions, the set FIRST(X) for some nonterminal X is the set of all terminals that can potentially appear at the start of some string derived from X. If you have a production of the form A → Bω and you see the nonterminal t, you'd guess to use that production precisely when t ∈ FIRST(B); 这是FIRST集合的用武之地。在没有ε产生的语法中，某些非终结符号的集合FIRST（X）是可能出现在从X派生的某个字符串的开头的所有终端的集合。如果你有形式A→Bω，你看到非终结t，你猜想在t∈FIRST（B）时精确地使用该生产; that is, B can derive some string that starts with t. 也就是说，B可以导出一些以t开头的字符串。 If B doesn't derive anything starting with t, then there's no reason to choose it, and if B does derive something starting with t, you'd want to make this choice so that you could eventually match the t against it. 如果B没有从t开始得到任何东西，那么就没有理由选择它，如果B确实得到以t开头的东西，你想做出这个选择，这样你最终可以匹配它。

Things get a bit trickier when ε productions are introduced. 当ε产品推出时，事情变得有点棘手。 Now, let's suppose that you have a production of the form A → BCω, where B and C are nonterminals and ω is a string. 现在，让我们假设你有一个形式A→BCω，其中B和C是非终结符，ω是一个字符串。 Let's also suppose the next token of input is t. 我们还假设输入的下一个标记是t。 If t ∈ FIRST(B), then we'd choose this production, as mentioned above. 如果t∈FIRST（B），那么我们就选择这种生产，如上所述。 However, what happens if t ∉ FIRST(B)? 但是，如果t∉FIRST（B）会发生什么？ If there are ε productions in the grammar, we might still want to choose this production if B can derive ε and t ∈ FIRST(C). 如果语法中有ε产生，如果B可以导出ε和t∈FIRST（C），我们可能仍然想要选择这个产生。 Why is this? 为什么是这样？ If this happens, it means that we might be able to match the t by producing BCω, then producing ε from B, leaving Cω against which to match the t. 如果发生这种情况，则意味着我们可能通过产生BCω来匹配t，然后从B产生ε，留下Cω与t匹配。 This is one context where we might have to "look through" a nonterminal. 这是我们可能必须“浏览”非终结者的一个背景。 Fortunately, this is handled by FIRST sets. 幸运的是，这是由FIRST集处理的。 If a nonterminal X can produce ε, then ε ∈ FIRST(X). 如果非终结X可以产生ε，那么ε∈FIRST（X）。 Therefore, we can use FIRST sets to check whether we need to "look through" a nonterminal by seeing whether ε ∈ FIRST(X). 因此，我们可以使用FIRST集来检查我们是否需要通过查看ε∈FIRST（X）来“查看”非终结符。

So far we haven't talked about FOLLOW sets. 到目前为止，我们还没有谈到FOLLOW集。 Where do they come in? 他们来自哪里？ Well, suppose that we're processing the nonterminal A, we see the terminal t, but none of the productions for A can actually consume the t. 好吧，假设我们正在处理非终结符号A，我们看到终端t，但是A的生成都没有实际消耗t。 What do we do then? 那我们做什么？ It turns out there's still a way that we can eat up that t. 事实证明，仍有一种方法可以让我们吃掉它。 Remember that LL(1) parsers work by maintaining a workspace with a string in it. 请记住LL（1）解析器通过维护带有字符串的工作空间来工作。 It's possible that the t we're looking at is not supposed to be matched against the current nonterminal A at all, and instead we're supposed to have A produce ε and then let some later nonterminal in the workspace match against the t. 我们所看到的可能根本不应该与当前的非终结符号A匹配，而是我们应该有A产生ε然后让工作区中的一些后来的非终结符合t。 This is where FOLLOW sets come in. The FOLLOW set of a nonterminal X, denoted FOLLOW(X), is the set of all terminal symbols that can appear immediately after X in some derivation. 这是FOLLOW集合的来源。非终结符号的FOLLOW集合，表示为FOLLOW（X），是在某些派生中可以紧接在X之后出现的所有终端符号的集合。 When choosing which production to choose for A, we add in a final rule - if the terminal symbol t is in the FOLLOW set of A, we choose some production that ultimately will produce ε. 当选择为A选择哪个生产时，我们添加一个最终规则 - 如果终端符号t在A的FOLLOW集合中，我们选择一些最终将产生ε的生产。 That way, the A will "disappear" and we can match the t against some character that appears after the A nonterminal. 这样，A将“消失”，我们可以将t与A非终结后出现的某些角色进行匹配。

This isn't a complete introduction to LL(1) parsing, but I hope it helps you see why we need FIRST and FOLLOW sets. 这不是对LL（1）解析的完整介绍，但我希望它能帮助您了解我们为什么需要FIRST和FOLLOW集合。 For more information, pick up a book on parsing (I recommend Parsing Techniques: A Practical Guide by Grune and Jacobs) or take a course on compilers. 有关更多信息，请阅读一本关于解析的书（我推荐Parsing Techniques： Grune和Jacobs的实用指南 ）或参加编译器课程。 As a totally shameless plug, I taught a compilers course in Summer 2012-2013 and all of the lecture slides are available online . 作为一个完全无耻的插件，我在2012-2013夏季教授编译器课程，所有的演讲幻灯片都可在线获取。