简体   繁体   English

是否必须在C和C ++中转义制表符?

[英]Is it mandatory to escape tabulator characters in C and C++?

In C and C++ (and several other languages) horizontal tabulators (ASCII code 9) in character and string constants are denoted in escaped form as '\\t' and "\\t" . 在C和C ++(以及其他几种语言)中,字符和字符串常量中的水平制表符(ASCII代码9)以转义形式表示为'\\t'"\\t" However, I am regularly typing the unescaped tabulator character in string literals as for example in "AB" (there is a TAB in betreen A and B ), and at least clang++ does not seem to bother - the string seems to be equivalent to "A\\tB" . 但是,我经常在字符串文字中键入未转义的制表符字符,例如在"AB" (在betreen AB有一个TAB),并且至少clang ++似乎没有打扰 - 字符串似乎等同于"A\\tB" I like the unescaped version better since long indented multi-line strings are better readable in the source code. 我更喜欢未转义的版本,因为长缩进的多行字符串在源代码中更易读。

Now I am asking myself whether this is generally legal in C and C++ or just supported by my compiler. 现在我问自己这在C和C ++中是否通常是合法的,或者只是由我的编译器支持。 How portable are unescaped tabulators in character and string constants? 非字符表制表符在字符和字符串常量中的可移植性如何?

Surprisingly I could not find an answer to this seemingly simple question, neither with Google nor on stackoverflow (I just found this vaguely related question). 令人惊讶的是,我无法找到这个看似简单的问题的答案,无论是Google还是stackoverflow(我刚发现这个含糊不清的相关问题)。

Yes, you can include a tab character in a string or character literal, at least according to C++11. 是的,您可以在字符串或字符文字中包含制表符,至少根据C ++ 11。 The allowed characters include (with my emphasis): 允许的字符包括(强调我的意思):

any member of the source character set except the double-quote " , backslash \\ , or new-line character 源字符集的任何成员,除了双引号" ,反斜杠\\ ”或换行符之外

(from C++11 standard , annex A.2) (来自C ++ 11标准 ,附件A.2)

and the source character set includes: 源字符集包括:

the space character, the control characters representing horizontal tab , vertical tab, form feed, and new-line, plus the following 91 graphical characters 空格字符,表示水平制表符 ,垂直制表符,换页符和换行符的控制字符,以及以下91个图形字符

(from C++11 standard , paragraph 2.3.1) (来自C ++ 11标准 ,第2.3.1段)

UPDATE: I've just noticed that you're asking about two different languages. 更新:我刚刚注意到你问的是两种不同的语言。 For C99, the answer is also yes. 对于C99,答案也是肯定的。 The wording is different, but basically says the same thing: 措辞不同,但基本上说同样的事情:

In a character constant or string literal, members of the execution character set shall be represented by corresponding members of the source character set or [...] 在字符常量或字符串文字中,执行字符集的成员应由源字符集的相应成员或[...]表示。

where both the source and execution character sets include 源和执行字符集都包含的位置

control characters representing horizontal tab , vertical tab, and form feed. 控制表示水平制表符 ,垂直制表符和换页符的字符。

It's completely legal to put a tab character directly into a character string or character literal. 将制表符直接放入字符串或字符文字中是完全合法的。 The C and C++ standards require the source character set to include a tab character, and string and character literals may contain any character in the source character set except backslash, quote or apostrophe (as appropriate) and newline. C和C ++标准要求源字符集包含制表符,字符串和字符文字可以包含源字符集中的任何字符,但反斜杠,引号或撇号(视情况而定)和换行符除外。

So it's portable. 所以它是便携式的。 But it is not a good idea, since there is no way a reader can distinguish between different kinds of whitespace. 但这并不是一个好主意,因为读者无法区分不同类型的空白。 It is also quite common for text editors, mail programs, and the like to reformat tabs, so bugs may be introduced into the program in the course of such operations. 对于文本编辑器,邮件程序等来说,重新格式化标签也是很常见的,因此可以在这样的操作过程中将错误引入到程序中。

If you enter a tab into an input, then your string will contain a literal tab character, and it will stay a tab character - it wont' be magically translated into \\t internally. 如果在输入中输入一个选项卡,那么你的字符串将包含一个文字制表符,它将保留一个制表符 - 它不会被神奇地翻译成\\t内部。

Same goes for writing code - you can embed literal tab characters in your strings. 编写代码同样如此 - 您可以在字符串中嵌入文字制表符。 However, consider this: 但是,考虑一下:

     T     T     T        <--tab stops
012345012345012345012345
foo1 = 'a\tb';
foo2 = 'a  b'; // pressed tab in the editor
foo3 = 'a  b'; // hit space twice in the editor

Unless you put the cursor on the whitespace between a and b and checked how many characters are in there, there is essentially NO way to determine if there's a tab or actual space characters in there. 除非你将光标放在ab之间a空白处并检查其中有多少个字符,否则基本上没有办法确定那里是否有制表符或实际空格字符。 But with the \\t version, it is immediately shown to be a tab. 但是使用\\t版本,它会立即显示为选项卡。

When you press the TAB key you get whatever code point your system maps that key to. 当您按TAB键时,您将获得系统将该键映射到的任何代码点。 That code point may or may not be a tab on the system where the program runs. 该代码点可能是也可能不是程序运行的系统上的选项卡。 When you put \\t in a literal the compiler replaces it with the appropriate code point for the target system. 当您将\\ t放在文字中时,编译器会将其替换为目标系统的相应代码点。 So if you want to be sure that you get a tab on the system where the program runs, use \\t. 因此,如果您想确保在程序运行的系统上获得一个选项卡,请使用\\ t。 That's its job. 这是它的工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM