简体   繁体   English

词法分析器中的二元和一元减号运算符

[英]Binary and unary minus operator in Lexical Analyzer

So, I am doing a lexical analysis of a TOY programming language using flex.因此,我正在使用 flex 对 TOY 编程语言进行词法分析。 I am currently stuck at the following point.我目前停留在以下几点。

Minus sign: As we know the minus sign can have two meanings by defining them as binary or unary operators (I also know you can discard two meanings and just say that -2 is the same as 0-2).减号:正如我们所知,减号可以通过将它们定义为二元或一元运算符来具有两种含义(我也知道您可以舍弃两种含义,只说 -2 与 0-2 相同)。 Firstly, I have only studied lexical analyzers as of now and I don't know anything about parsers.首先,我目前只研究过词法分析器,对解析器一无所知。 So, should I care about distinguishing these two minus signs, becomes sometimes the analyzer will print -2 as a Numeric literal and sometimes - as an operator and 2 as a numeric literal?那么,我是否应该关心区分这两个减号,有时分析器会将 -2 作为数字文字打印,有时 - 作为运算符打印,而 2 作为数字文字打印? Or can this kind of clubbing of two literals be done during parsing?或者可以在解析期间完成这种两个文字的组合吗? If yes, then should I only define positive numbers and - as a unary operator?如果是,那么我应该只定义正数和 - 作为一元运算符吗?

Things get really complicated if you try to analyse signed integers as single tokens.如果您尝试将有符号整数作为单个标记进行分析,事情就会变得非常复杂。 For example, you will have to avoid analysing x-1 as { x , -1 }, since that - is an operator.例如,您必须避免将x-1分析为 { x , -1 },因为-是一个运算符。

If you analyse signed integers as two tokens -- a sign and then an integer -- then you will get a sensible parse.如果您将带符号的整数分析为两个标记——一个符号,然后是一个 integer——那么您将得到一个合理的解析。 There are only two issues:只有两个问题:

  1. You really don't want to evaluate -1 at runtime.您真的不想在运行时评估-1 It's a constant value, and it should be evaluated at compile time.它是一个常量值,应该在编译时计算。 But that's no different from any other constant expression.但这与任何其他常量表达式没有什么不同。 (1+1) should ideally be folded to 2 at compile time. (1+1)最好在编译时折叠为2 Constant folding is a relatively easy optimization to implement, although you should get things working first.常量折叠是一种相对容易实现的优化,尽管您应该首先让事情正常进行。

  2. In two's complement, the smallest negative integer (-2147483648 if you're using signed 32-bit integers) is integer overflow if the integer part is separated from the sign.在二进制补码中,如果 integer 部分与符号分开,则最小负数 integer(如果您使用的是带符号的 32 位整数,则为 -2147483648)是 integer 溢出。 2147483648 . 2147483648 There are a number of ways of dealing with this, most of which lead to some kind of parsing quirk.有很多方法可以解决这个问题,其中大多数会导致某种解析怪癖。 For example, a C compiler targetting a 32-bit int will assign a wider type than int to 2147483648, and that means that -2147483647 and -2147483648 are unexpectedly different types.例如,针对 32 位int的 C 编译器会将比int更宽的类型分配给 2147483648,这意味着 -2147483647 和 -2147483648 是出乎意料的不同类型。 (That's why INT_MIN is usually #define d as (-2147483647 - 1) .) On the other hand, if you just assume that the compiler is running in a 2's-complement environment with integers wrapping around, then you'll get the right value for -2147483648 . (这就是为什么INT_MIN通常#define d 为(-2147483647 - 1)的原因。)另一方面,如果您只是假设编译器在整数环绕的 2 补码环境中运行,那么您将得到正确的结果-2147483648的值。 But you'll get the same value for 2147483648 , which should be flagged with a diagnostic.但是您将获得2147483648的相同值,应该用诊断标记。 For a student project with a toy language, these quirks are probably innocuous, but you should make some attempt to document them.对于使用玩具语言的学生项目,这些怪癖可能是无害的,但您应该尝试记录它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM