简体繁体 English

在递归下降解析器中计算语法产生的预测集的方法

[英]Method to calculate predict set of a grammar production in recursive descent parser

原文 2017-10-18 17:49:42 5 1 parsing/ compiler-construction/ context-free-grammar

I understand first and follow but I am totally lost on the predict sets. 我先理解然后跟随，但是我对预测的设定完全迷失了。 can someone explain to me how to go about finding a predict set of a production in a grammar using the first and follow sets? 有人可以向我解释如何使用第一和第二集合来寻找语法中的预测集吗？ I have not provided a grammar because this is for a homework assignment and I want to know how to do it not how to do it for this specific grammar. 我没有提供语法，因为这是针对家庭作业的，我想知道如何针对特定的语法而不是怎么做。

1 个解决方案

Intuitively, the predict set for a production A → α [Note 1] is the set of terminal symbols which might be the next symbol to be read if that production is to be predicted. 从直觉上讲，产量A → α的预测集[注1]是终端符号集，如果要预测产量，则可能是下一个要读取的符号。 (That implies that the production's non-terminal ( A ) has already been predicted, and the parser must now decide which of the non-terminal's productions to predict.) （这意味着已经预测了生产的非终端生产（ A ），并且解析器现在必须确定要预测哪个非终端的生产。）

Obviously, that includes all the terminal symbols which might be the first symbol of the right-hand side. 显然，这包括可能是右侧第一个符号的所有终端符号。 But what if the right-hand side might derive ε, the empty string? 但是，如果右侧可能派生出ε，即空字符串呢？ In that case, the next symbol in the input will be the first symbol which comes after the predicted non-terminal, A ; 在这种情况下，输入中的下一个符号将是预测的非终结符A之后的第一个符号。 in other words, it will be a member of FOLLOW(A) . 换句话说，它将成为FOLLOW(A)的成员。 So the predict set contains the terminals which might start the right-hand side α , plus all the symbols in FOLLOW(A) if α could derive the empty string. 因此，预测集包含可能从右侧α开始的终端，以及如果α可以得出空字符串，则加上FOLLOW(A)所有符号。 [Note 2] [笔记2]

More formally, PREDICT(A → α) is: 更正式地说， PREDICT(A → α)为：

FIRST(α) if ε ∉ FIRST(α) 如果ε ∉ FIRST(α) FIRST(α)则为ε ∉ FIRST(α)
(FIRST(α) ∪ FOLLOW(A)) - {ε} if ε ∈ FIRST(α) (FIRST(α) ∪ FOLLOW(A)) - {ε}如果ε ∈ FIRST(α)

Remember that we compute FIRST on a sentential form by "looking through" epsilons: 请记住，我们通过“浏览”ε来以句子形式计算FIRST ：

FIRST(aβ) is FIRST(aβ)是

FIRST(a) if ε ∉ FIRST(a) FIRST(a)如果ε ∉ FIRST(a)
(FIRST(a) - {ε}) ∪ FIRST(β) if ε ∈ FIRST(a) (FIRST(a) - {ε}) ∪ FIRST(β)如果ε ∈ FIRST(a)

Consequently, FIRST of a right hand side only include ε if every symbol in the right-hand side is nullable. 因此，如果右侧的每个符号都可以为空，则右侧的FIRST仅包含ε 。

Notes: 笔记：

I use the common convention that capital letters ( A ...) refer to non-terminals, lower-case letters ( a ...) refer to grammar symbols (terminals or non-terminals) and Greek letters ( α ...) refer to possibly empty sequences of grammar symbols. 我使用的通用约定是，大写字母（ A ...）表示非终结符，小写字母（ a ...）表示语法符号（终结符或非终结符）和希腊字母（ α ...）指可能为空的语法符号序列。
Aside from the first step when the start symbol is predicted, the current prediction always contains more than one symbol. 除了第一步已预测开始符号外，当前预测始终包含多个符号。 So if A is the next non-terminal to expand and we see that it is nullable (ie, it could derive nothing), we don't really need to lookup FOLLOW(A) because we could just look at the predict stack and see what we've predicted will follow A . 因此，如果A是要扩展的下一个非终结符，并且我们看到它是可为空的（即它不能派生任何东西），则我们实际上不需要查找FOLLOW(A)因为我们可以仅查看预测堆栈并查看我们所预测的将遵循A In some cases, this might allow us to avoid a conflict with one of the other alternatives for A . 在某些情况下，这可能使我们避免与A的其他替代方法之一发生冲突。
However, it is normal to use FOLLOW(A) , regardless. 但是，无论如何都使用FOLLOW(A)是正常的。 Always using FOLLOW(A) is usually referred to as the "Strong LL" (SLL) algorithm. 始终使用FOLLOW(A)通常被称为“ Strong LL”（SLL）算法。 Although it seems like computing the FIRST set of the known prediction stack is more powerful than using a precomputed FOLLOW set, it does not actually improve the power of LL parsing at all; 尽管似乎计算已知预测堆栈的FIRST集比使用预先计算的FOLLOW集更强大，但它实际上并没有提高LL解析的能力； every non-LL grammar can be converted to an SLL grammar. 每个非LL语法都可以转换为SLL语法。