简体   繁体   English

用一元减去宏观扩张

[英]Macro expansion with unary minus

Consider the following code: 请考虑以下代码:

#define A -100

//later..
void Foo()
{
  int bar = -A;
  //etc..
}

Now, this compiles fine on some major compilers I tested (MSVC, GCC, Clang) and bar == 100 as expected, this is because the preprocessors of all those compilers insert a space between the tokens so you end up with: 现在,这可以很好地编译我测试的一些主要编译器(MSVC,GCC,Clang)和bar == 100 ,这是因为所有这些编译器的预处理器在令牌之间插入一个空格,所以你最终得到:

int bar = - -100;

As I'd like my code to be as portable as possible I went to check if this behavior is defined by the standard but I can't find anything on it. 因为我希望我的代码尽可能便携,所以我去检查这个行为是否由标准定义,但我找不到任何内容。 Is this behavior guaranteed by the standard or is this just a compiler feature and is the naive approach (which wouldn't compile obviously) bar = --100; 标准是否保证了这种行为,或者这只是一个编译器功能而且是天真的方法(显然无法编译) bar = --100; allowed too? 也允许?

This is specified in the language: the two - character will not end-up in being concatenated to form a -- operator. 这是在指定的语言:两个-字符不会结束-Up在被连接起来形成一个--运营商。

This absence of concatenation is ensured by the way source files must be parsed: macro expansion is performed in translation phase 4. Before this translation phase, during translation phase 3, the source file must be transformed in a sequence of preprocessing tokens and white spaces [lex.phases]/3 : 必须解析源文件的方式确保不存在连接:在转换阶段4执行宏扩展。在转换阶段之前,在转换阶段3期间,源文件必须在预处理标记和空格序列中进行转换[ lex.phases] / 3

The source file is decomposed into preprocessing tokens and sequences of white-space characters (including comments). 源文件被分解为预处理标记和空白字符序列(包括注释)。 A source file shall not end in a partial preprocessing token or in a partial comment.13 Each comment is replaced by one space character. 源文件不应以部分预处理令牌或部分注释结束.13每个注释都替换为一个空格字符。 New-line characters are retained. 保留换行符。 Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is unspecified. 是否保留除了换行符之外的每个非空白字符序列是否由一个空格字符保留或替换是未指定的。

So after translation phase 3, the sequence of tokens near the definition of bar may look like: 因此,在翻译阶段3之后,靠近条形定义的标记序列可能如下所示:

// here {...,...,...} is used to list preprocessing tokens.
{int, ,bar, ,=, ,-,A,;}

Then after phase 4 you will get: 然后在第4阶段之后你会得到:

{int, ,bar, ,=, ,-,-, ,100,;}

Space are conceptually removed at phase 7: 在第7阶段概念性地删除了空间:

{int,bar,=,-,-,100,;}

Once the input is split into preprocessing tokens at early stages of translation, the only way to make two adjacent preprocessing tokens to merge into one token is ## operator of preprocessor. 一旦在转换的早期阶段将输入拆分为预处理令牌 ,使两个相邻的预处理令牌合并为一个令牌的唯一方法是预处理器的##运算符。 This is what ## operator is for. 这是##运算符的用途。 This is why it is necessary. 这就是必要的原因。

Once the preprocessing is complete, the compiler proper will analyze the code in terms of pre-parsed preprocessing tokens. 预处理完成后,编译器将根据预解析的预处理标记分​​析代码。 The compiler proper will not attempt to merge two adjacent tokens into one token. 编译器本身不会尝试将两个相邻的令牌合并为一个令牌。

In your example the inner - and the outer - are two different preprocessing tokens. 在您的例子内-和外-是两个不同的预处理标记。 They will not merge into one -- token and they will not be seen by the compiler proper as one -- token. 他们不会合并成一个--令牌,他们将不会被编译器适合作为一个可以看到--令牌。

For example 例如

#define M1(a, b) a-b
#define M2(a, b) a##-b

int main()
{
  int i = 0;
  int x = M1(-, i); // interpreted as `int x = -(-i);`
  int y = M2(-, i); // interpreted as `int y = --i;` 
}

This is how the language specification defines the behavior. 这是语言规范定义行为的方式。

In practical implementations the preprocessing stage and the compiling stage are usually decoupled from each other. 在实际实现中,预处理阶段和编译阶段通常彼此分离。 And the output of preprocessing stage is typically represented in plain text form (not as some database of tokens). 预处理阶段的输出通常以纯文本形式表示(而不是某些令牌数据库)。 In such implementations the preprocessor and the compiler proper have to agree on some convention on how to separate adjacent ("touching") preprocessing tokens. 在这样的实现中,预处理器和编译器本身必须就如何分离相邻(“触摸”)预处理令牌的一些约定达成一致。 Typically the preprocessor will insert an extra space between two separate tokens that happen to "touch" in source code. 通常,预处理器将在两个单独的令牌之间插入额外的空间,这些令牌碰巧在源代码中“触摸”。

The standard does say anything about that extra space, and formally it is not supposed to be there, but this is just how this separation is typically implemented in practice. 该标准确实对该额外空间说了什么,并且正式地说它不应该存在,但这正是这种分离通常在实践中如何实现的。

Note that since that space "is not supposed to be there", such implementations will also have to make some effort to ensure that this extra space is "undetectable" in other contexts. 请注意,由于该空间“不应该在那里”,因此这些实现还必须付出一些努力来确保在其他环境中“不可检测”这个额外空间。 For example 例如

#define M1(a, b) a-b
#define M2(a, b) a##-b

#define S_(x) #x
#define S(x) S_(x)

int main()
{
  std::cout << S(M1(-, i)) << std::endl; // outputs `--i`
  std::cout << S(M2(-, i)) << std::endl; // outputs `--i`
}

Both lines of main are supposed to output --i . 两条main都应该输出--i

So, to answer your original question: yes, your code is portable in a sense that in a standard-compliant implementation those two - characters will never become a -- . 因此,要回答你原来的问题:是的,你的代码是可移植在某种意义上,在一个符合标准的实现这两个-字符永远不会成为一个-- But the actual insertion of space is just an implementation detail. 但实际插入空间只是一个实现细节。 Some other implementation might use a different technique for preventing those - from merging into a -- . 其他一些实现可能会使用不同的技术来防止这些-从合并到--

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM