简体繁体 English

确定性无上下文语法与无上下文语法？

[英]Deterministic Context-Free Grammar versus Context-Free Grammar?

原文 2014-03-20 02:18:56 1 1 parsing/ programming-languages/ big-o/ context-free-grammar/ context-free-language

I'm reading my notes for my comparative languages class and I'm a bit confused... 我正在读我的比较语言课的笔记，我有点困惑......

What is the difference between a context-free grammar and a deterministic context-free grammar? 无上下文语法和确定性无上下文语法之间有什么区别？ I'm specifically reading about how parsers are O(n^3) for CFGs and compilers are O(n) for DCFGs, and don't really understand how the difference in time complexities could be that great (not to mention I'm still confused about what the characteristics that make a CFG a DCFG). 我特别在阅读解析器如何用于CFG的解析器是O（n ^ 3），并且编译器对于DCFG来说是O（n），并且并不真正理解时间复杂度的差异是如何的那么好（更不用说我是仍然混淆了使CFG成为DCFG的特征。

Thank you so much in advance! 非常感谢你提前！

1 个解决方案

Conceptually they are quite simple to understand. 从概念上讲，它们很容易理解。 The context free grammars are those which can be expressed in BNF. 无上下文语法是可以用BNF表达的语法。 The DCFGs are the subset for which a workable parser can be written. DCFG是可以编写可行解析器的子集。

In writing compilers we are only interested in DCFGs. 在编写编译器时，我们只对DCFG感兴趣。 The reason is that 'deterministic' means roughly that the next rule to be applied at any point in the parse is determined by the input so far and a finite amount of lookahead. 原因是'确定性'大致意味着在解析中的任何一点应用的下一个规则是由目前的输入和有限量的前瞻确定的。 Knuth invented the LR() compiler back in the 1960s and proved it could handle any DCFG. Knuth在20世纪60年代发明了LR（）编译器并证明它可以处理任何DCFG。 Since then some refinements, especially LALR(1) and LL(1), have defined grammars that can be parsed in limited memory, and techniques by which we can write them. 从那时起，一些改进，特别是LALR（1）和LL（1），已经定义了可以在有限的内存中解析的语法，以及我们可以编写它们的技术。

We also have techniques to derive parsers automatically from the BNF, if we know it's one of these grammars. 我们还有从BNF自动派生解析器的技术，如果我们知道它是这些语法之一。 Yacc, Bison and ANTLR are familiar examples. Yacc，Bison和ANTLR是熟悉的例子。

I've never seen a parser for a NDCFG, but at any point in the parse it would potentially need to consider the whole of the input string and every possible parse that could be applied. 我从来没有见过NDCFG的解析器，但是在解析的任何一点，它都可能需要考虑整个输入字符串和可能应用的每个可能的解析。 It's not hard to see why that would get rather large and slow. 不难看出为什么会变得相当大而缓慢。

I should point out that many real languages are imperfect, in that they are not entirely context free, not unambiguous or otherwise depart from the ideal DCFG. 我应该指出，许多真正的语言是不完美的，因为它们不是完全没有上下文，不是明确的或者不同于理想的DCFG。 C/C++ is a good example, but there are many others. C / C ++是一个很好的例子，但还有很多其他的。 These languages are usually handled by special purpose rules such as semantic or syntactic predicates, special case backtracking or other 'tricks' with no effect on performance. 这些语言通常由特殊用途规则处理，例如语义或句法谓词，特殊情况回溯或其他“技巧”，对性能没有影响。

The comments point out that certain kinds of NDCFG are common and many tools provide a way to handle them. 评论指出某些类型的NDCFG是常见的，许多工具提供了一种处理它们的方法。 One common problem is ambiguity. 一个常见问题是模棱两可。 It is relatively easy to parse an ambiguous grammar by introducing a simple local semantic rule, but of course this can only ever generate one of the possible parse trees. 通过引入简单的本地语义规则来解析模糊语法相对容易，但当然这只能生成一个可能的解析树。 A generalised parser for NDCFG would potentially produce all parse trees, and could perhaps allow those trees to be filtered on some arbitrary condition. NDCFG的通用解析器可能会产生所有解析树，并且可能允许在某些任意条件下过滤这些树。 I don't know any of those. 我不知道其中任何一个。

Left recursion is not a feature of NDCFG. 左递归不是NDCFG的特征。 It presents a particular challenge to the design of LL() parsers but no problems for LR() parsers. 它对LL（）解析器的设计提出了特殊的挑战，但LR（）解析器没有问题。