简体   繁体   English

字符串文字串联的正确行为(C ++ 11翻译阶段6)

[英]Correct behavior for string literal concatenation (C++11 phase 6 of translation)

I'm pretty sure that Visual C++ 2015 has a bug here, but I don't feel 100% sure. 我很确定Visual C ++ 2015在这里有一个bug,但我不觉得百分百肯定。

Code: 码:

// Encoding: UTF-8 with BOM (required by Visual C++).
#include <stdlib.h>

auto main()
    -> int
{
    auto const s = L""
        "𐐷 is not in the Unicode BMP!";
    return s[0] > 256? EXIT_SUCCESS : EXIT_FAILURE;
}

Result with g++: 使用g ++的结果:

[H:\scratchpad\simple_text_io]
> g++ --version | find "++"
g++ (i686-win32-dwarf-rev1, Built by MinGW-W64 project) 6.2.0

[H:\scratchpad\simple_text_io]
> g++ compiler_bug_demo.cpp

[H:\scratchpad\simple_text_io]
> run a
Process exit code = 0.

[H:\scratchpad\simple_text_io]
> _

Result with Visual C++: Visual C ++的结果:

[H:\scratchpad\simple_text_io]
> cl /nologo- 2>&1 | find "++"
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.23026 for x86

[H:\scratchpad\simple_text_io]
> cl compiler_bug_demo.cpp /Feb
compiler_bug_demo.cpp
compiler_bug_demo.cpp(8): warning C4566: character represented by universal-character-name '\U00010437' cannot be represented in the current code page (1252)

[H:\scratchpad\simple_text_io]
> run b
Process exit code = 1.

[H:\scratchpad\simple_text_io]
> _

Is there any UB involved, and if not, which compiler behaves correctly? 是否涉及任何UB,如果没有,哪个编译器行为正确?

Addendum: 附录:

The behavior is unchanged for both compilers if use lowercase greek PI, “π”, which is in the BMP, so that doesn't seem to matter. 如果在BMP中使用小写希腊PI,“π”, 两个编译器的行为都不会改变,因此这似乎并不重要。

From [lex.string] : 来自[lex.string]

  1. In translation phase 6, adjacent string literals are concatenated. 在翻译阶段6中,连接相邻的字符串文字。 If both string literals have the same encoding-prefix, the resulting concatenated string literal has that encoding-prefix. 如果两个字符串文字具有相同的encoding-prefix,则生成的连接字符串文字具有该encoding-prefix。 If one string literal has no encoding-prefix, it is treated as a string literal of the same encoding-prefix as the other operand. 如果一个字符串文字没有编码前缀,则将其视为与另一个操作数相同的编码前缀的字符串文字。 If a UTF-8 string literal token is adjacent to a wide string literal token, the program is ill-formed. 如果UTF-8字符串文字标记与宽字符串文字标记相邻,则程序格式错误。 Any other concatenations are conditionally-supported with implementation-defined behavior. 实现定义的行为有条件地支持任何其他连接。 [ Note: This concatenation is an interpretation, not a conversion. [注意:此连接是一种解释,而不是转换。 Because the interpretation happens in translation phase 6 (after each character from a literal has been translated into a value from the appropriate character set), a string literal's initial rawness has no effect on the interpretation or well-formedness of the concatenation. 因为解释发生在翻译阶段6(在文字中的每个字符都被翻译成适当字符集的值之后),字符串文字的初始原始性对连接的解释或格式良好没有影响。 —end note ] Table 8 has some examples of valid concatenations. -end note]表8列出了一些有效连接的例子。

So there is no UB here, however phase 5 of translation might have already changed values of some characters: 所以这里没有UB,但翻译的第5阶段可能已经改变了一些字符的值:

  1. Each source character set member in a character literal or a string literal , as well as each escape sequence and universal-character-name in a character literal or a non-raw string literal, is converted to the corresponding member of the execution character set if there is no corresponding member, it is converted to an implementation-defined member other than the null (wide) character. 字符文字或字符串文字 中的 每个源字符集成员 ,以及字符文字或非原始字符串文字中的每个转义序列和通用字符名称,将转换为执行字符集的相应成员,如果没有相应的成员,它被转换为除null(宽)字符以外的实现定义的成员

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM