简体繁体 English

上下文无关文法微分整数和浮点常数

[英]Context Free Grammar Differentiate Integer and Floating Point Constants

原文 2016-02-19 20:53:55 3 1 python/ c/ parsing/ context-free-grammar

I am writing an LR(1) parser, and I've been basing my test grammar off of the C language. 我正在编写LR（1）解析器，并且我的测试语法基于C语言。 I've looked at the grammar for both C and Python: 我看过C和Python的语法：

https://www.lysator.liu.se/c/ANSI-C-grammar-y.html https://docs.python.org/3/reference/grammar.html https://www.lysator.liu.se/c/ANSI-C-grammar-y.html https://docs.python.org/3/reference/grammar.html

C seems to use the symbol CONSTANT for integer and floating point constants, and Python uses NUMBER. C似乎对整数和浮点常量使用符号CONSTANT，而Python使用NUMBER。

What I'm wondering is why are these not separated into individual symbols such as INT and FLOAT so that they can later be put into separate nodes in the Abstract Syntax Tree? 我想知道的是为什么为什么不将它们分隔为单独的符号（例如INT和FLOAT），以便以后可以将它们放入“抽象语法树”中的单独节点中？

Since we already know what type of number it is after the lexer has parsed it, why merge them into a generic 'NUMBER' and later try to figure out which one it is again? 既然在词法分析器解析之后我们已经知道它是什么类型的数字，那么为什么将它们合并为通用的“ NUMBER”，然后再尝试找出它又是哪个数字呢？

1 个解决方案

Being able to handle some special cases earlier does not simplify things, since you still need the same code in a different place later. 能够较早处理某些特殊情况并不能简化事情，因为稍后您仍然需要在不同地方使用相同的代码。 For example, consider the code y + z . 例如，考虑代码y + z 。 Python doesn't know what that is, other than at run time it will invoke y.__add__(z) . Python不知道它是什么，除了在运行时，它还会调用y.__add__(z) 。 The code to generate that isn't going away. 生成的代码不会消失。 That same code can take 3 + x and just as easily generate (3).__add__(z) . 相同的代码可能需要3 + x并且很容易生成(3).__add__(z) 。 So it doesn't really simplify anything to distinguish between y + z and 3 + z during parsing. 因此，在解析过程中区分y + z和3 + z并没有真正简化任何事情。 (The same logic holds if y is a float literal instead of an identifier.) （如果y是浮点文字而不是标识符，则逻辑相同。）

Now consider something like 3.0 + 5 . 现在考虑3.0 + 5类的东西。 Separate code exists to replace this with 8.0 instead of (3.0).__add__(5) prior to byte-code compilation, because 1) it's simple to do and 2) it is demonstrably better than invoking a function at run time. 存在单独的代码，用字节代码编译之前的8.0而不是(3.0).__add__(5)代替，因为1）操作简单，2）显然比在运行时调用函数好。 However, this still isn't done by the parser. 但是，解析器仍然无法做到这一点。 This is done by an optimizer that runs over the tree looking for things like NUMBER + NUMBER . 这是由运行在树上的优化器完成的，以查找NUMBER + NUMBER类的东西。 Once that is found, the optimizer can determine if the NUMBER s are ints or floats, and produce the appropriate sum to include in the code. 找到后，优化器可以确定NUMBER是int还是float，并产生适当的总和以包含在代码中。 This is simpler than having to handle 4 different bits of parse tree INT + FLOAT , FLOAT + INT , FLOAT + FLOAT , and INT + INT . 这比必须处理解析树INT + FLOAT ， FLOAT + INT ， FLOAT + FLOAT和INT + INT 4个不同位要简单。