简体   繁体   中英

Why is “\?” an escape sequence in C/C++?

There are four special non-alphabet characters that need to be escaped in C/C++: the single quote \\' , the double quote \\" , the backslash \\\\ , and the question mark \\? . It's apparently because they have special meanings. ' for single char , " for string literals, \\ for escape sequences, but why is ? one of them?

I read the table of escape sequences in a textbook today and I realized that I've never escaped ? before and have never encountered a problem with it. Just to be sure, I tested it under GCC:

#include <stdio.h>
int main(void)
{
    printf("question mark ? and escaped \?\n");
    return 0;
}

And the C++ version:

#include <iostream>
int main(void)
{
    std::cout << "question mark ? and escaped \?" << std::endl;
    return 0;
}

Both programs output: question mark ? and escaped ? question mark ? and escaped ?

So I have two questions:

  1. Why is \\? one of the escape sequence characters?
  2. Why does non-escaping ? work fine? There's not even a warning.

The more interesting fact is that the escaped \\? can be used the same as ? in some other languages as well. I tested in Lua/Ruby, and it's also true even though I didn't find this documented.

Why is \\? one of the escape sequence characters?

Because it is special. The answer leads to Trigraph , and the C/C++ preprocessor replaces the following three-character sequences with the corresponding single character. (C11 §5.2.1.1 and C++11 §2.3)

Trigraph:       ??(  ??)  ??<  ??>  ??=  ??/  ??'  ??!  ??-
Replacement:      [    ]    {    }    #    \    ^    |    ~

A trigraph is nearly useless now, and it is mainly used for obfuscation purposes. Some examples can be seen in IOCCC .

GCC doesn't support trigraph by default and will warn you if there's a trigraph in the code, unless the option -trigraphs 3 is enabled. Under the -trigraphs option, the second \\? is useful in the following example:

printf("\?\?!\n");

Output would be | if ? is not escaped.

For more information on trigraphs, see Cryptic line "??!??!" in legacy code


Why does non-escaping ? work fine. There's not even a warning.

Because ? (and double quote " ) can be represented by themselves by the standard:

C11 §6.4.4.4 Character constants Section 4

The double-quote " and question-mark ? are representable either by themselves or by the escape sequences \\" and \\? , respectively, but the single-quote ' and the backslash \\ shall be represented, respectively, by the escape sequences \\' and \\\\ .

Similar in C++:

C++11 §2.13.2 Character literals Section 3

Certain nongraphic characters, the single quote ' , the double quote " , the question mark ? , and the backslash \\ , can be represented according to Table 6. The double quote " and the question mark ? , can be represented as themselves or by the escape sequences \\" and \\? respectively, but the single quote ' and the backslash \\ shall be represented by the escape sequences \\' and \\\\ respectively. If the character following a backslash is not one of those specified, the behavior is undefined. An escape sequence specifies a single character.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM