简体   繁体   中英

Why “foo\\<NEWLINE>bar” becomes “foo\bar” after “gcc -E”?

See following example:

$ cat foo.c
int main()
{
    char *p = "foo\\
bar";
    return 0;
}
$ gcc -E foo.c
# 1 "foo.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "foo.c"
int main()
{
    char *p = "foo\bar";

    return 0;
}
$

From my understanding the 2nd \\ is escaped by the 1st \\ so the 2nd \\ should not be combined with the following <NEWLINE> to form the line continuation.

The rules are quite explicit in ISO/IEC 9899:2011 §5.1.1.2 Translation Phases :

  1. Each instance of a backslash character ( \\ ) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines. Only the last backslash on any physical source line shall be eligible for being part of such a splice.

The character preceding the final backslash is not consulted. Phase 1 converts trigraphs into regular characters. That matters because ??/ is the trigraph for \\ .

The preprocessor removes all occurrences of backslash-newline before even trying to tokenize the input; there is no escape mechanism for this. It's not limited to string literals either:

#inclu\
de <st\
dio.h>

int m\
ain(void) {
    /\
* yes, this is a comment */
    put\
s("Hello,\
 world!");
    return 0;
}

This is valid code.

Using \\\\ to get a single \\ only applies to string and character literals and happens much later in processing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM