简体繁体 English

词法分析器中的二元和一元减号运算符

[英]Binary and unary minus operator in Lexical Analyzer

原文 2023-01-20 06:19:02 3 1 compiler-construction/ flex-lexer/ lexical-analysis

So, I am doing a lexical analysis of a TOY programming language using flex.因此，我正在使用 flex 对 TOY 编程语言进行词法分析。 I am currently stuck at the following point.我目前停留在以下几点。

Minus sign: As we know the minus sign can have two meanings by defining them as binary or unary operators (I also know you can discard two meanings and just say that -2 is the same as 0-2).减号：正如我们所知，减号可以通过将它们定义为二元或一元运算符来具有两种含义（我也知道您可以舍弃两种含义，只说 -2 与 0-2 相同）。 Firstly, I have only studied lexical analyzers as of now and I don't know anything about parsers.首先，我目前只研究过词法分析器，对解析器一无所知。 So, should I care about distinguishing these two minus signs, becomes sometimes the analyzer will print -2 as a Numeric literal and sometimes - as an operator and 2 as a numeric literal?那么，我是否应该关心区分这两个减号，有时分析器会将 -2 作为数字文字打印，有时 - 作为运算符打印，而 2 作为数字文字打印？ Or can this kind of clubbing of two literals be done during parsing?或者可以在解析期间完成这种两个文字的组合吗？ If yes, then should I only define positive numbers and - as a unary operator?如果是，那么我应该只定义正数和 - 作为一元运算符吗？

1 个解决方案

Things get really complicated if you try to analyse signed integers as single tokens.如果您尝试将有符号整数作为单个标记进行分析，事情就会变得非常复杂。 For example, you will have to avoid analysing x-1 as { x , -1 }, since that - is an operator.例如，您必须避免将x-1分析为 { x , -1 }，因为-是一个运算符。

If you analyse signed integers as two tokens -- a sign and then an integer -- then you will get a sensible parse.如果您将带符号的整数分析为两个标记——一个符号，然后是一个 integer——那么您将得到一个合理的解析。 There are only two issues:只有两个问题：

You really don't want to evaluate -1 at runtime.您真的不想在运行时评估-1 。 It's a constant value, and it should be evaluated at compile time.它是一个常量值，应该在编译时计算。 But that's no different from any other constant expression.但这与任何其他常量表达式没有什么不同。 (1+1) should ideally be folded to 2 at compile time. (1+1)最好在编译时折叠为2 。 Constant folding is a relatively easy optimization to implement, although you should get things working first.常量折叠是一种相对容易实现的优化，尽管您应该首先让事情正常进行。
In two's complement, the smallest negative integer (-2147483648 if you're using signed 32-bit integers) is integer overflow if the integer part is separated from the sign.在二进制补码中，如果 integer 部分与符号分开，则最小负数 integer（如果您使用的是带符号的 32 位整数，则为 -2147483648）是 integer 溢出。 2147483648 . 2147483648 。 There are a number of ways of dealing with this, most of which lead to some kind of parsing quirk.有很多方法可以解决这个问题，其中大多数会导致某种解析怪癖。 For example, a C compiler targetting a 32-bit int will assign a wider type than int to 2147483648, and that means that -2147483647 and -2147483648 are unexpectedly different types.例如，针对 32 位int的 C 编译器会将比int更宽的类型分配给 2147483648，这意味着 -2147483647 和 -2147483648 是出乎意料的不同类型。 (That's why INT_MIN is usually #define d as (-2147483647 - 1) .) On the other hand, if you just assume that the compiler is running in a 2's-complement environment with integers wrapping around, then you'll get the right value for -2147483648 . （这就是为什么INT_MIN通常#define d 为(-2147483647 - 1)的原因。）另一方面，如果您只是假设编译器在整数环绕的 2 补码环境中运行，那么您将得到正确的结果-2147483648的值。 But you'll get the same value for 2147483648 , which should be flagged with a diagnostic.但是您将获得2147483648的相同值，应该用诊断标记。 For a student project with a toy language, these quirks are probably innocuous, but you should make some attempt to document them.对于使用玩具语言的学生项目，这些怪癖可能是无害的，但您应该尝试记录它们。