简体   繁体   中英

Why does GCC emit a warning when using trigraphs, but not when using digraphs?

Code:

#include <stdio.h>

int main(void)
{
  ??< puts("Hello Folks!"); ??>
}

The above program, when compiled with GCC 4.8.1 with -Wall and -std=c11 , gives the following warning:

source_file.c: In function ‘main’:
source_file.c:8:5: warning: trigraph ??< converted to { [-Wtrigraphs]
     ??< puts("Hello Folks!"); ??>
 ^
source_file.c:8:30: warning: trigraph ??> converted to } [-Wtrigraphs]

But when I change the body of main to:

<% puts("Hello Folks!"); %>

no warnings are thrown.

So, Why does the compiler warn me when using trigraphs, but not when using digraphs?

This gcc document on pre-processing gives a pretty good rationale for a warning ( emphasis mine ):

Trigraphs are not popular and many compilers implement them incorrectly. Portable code should not rely on trigraphs being either converted or ignored. With -Wtrigraphs GCC will warn you when a trigraph may change the meaning of your program if it were converted .

and in this gcc document on Tokenization explains digraphs unlike trigraphs do not potential negative side effects ( emphasis mine ):

There are also six digraphs , which the C++ standard calls alternative tokens, which are merely alternate ways to spell other punctuators. This is a second attempt to work around missing punctuation in obsolete systems. It has no negative side effects, unlike trigraphs ,

Because trigraphs have the undesirable effect of silently changing code. This means that the same source file is valid both with and without trigraph replacement, but leads to different code. This is especially problematic in string literals, like "<em>What??</em>" .

Language design and language evolution should strive to avoid silent changes. Having the compiler warn about trigraphs is a good thing to have.

Contrast this with digraphs, which were new tokens that do not lead to silent changes.

May be because it has no negative side effects, unlike trigraphs as is stated in gcc documentation:

Punctuators are all the usual bits of punctuation which are meaningful to C and C++. All but three of the punctuation characters in ASCII are C punctuators. The exceptions are '@', '$', and '`'. In addition, all the two- and three-character operators are punctuators. There are also six digraphs, which the C++ standard calls alternative tokens, which are merely alternate ways to spell other punctuators. This is a second attempt to work around missing punctuation in obsolete systems. It has no negative side effects, unlike trigraphs, but does not cover as much ground. The digraphs and their corresponding normal punctuators are:

 Digraph:        <%  %>  <:  :>  %:  %:%:
 Punctuator:      {   }   [   ]   #    ##

Trigraphs are nasty because they use character sequences which could legally appear within valid code. A common case which used to cause compiler errors on code for classic Macintosh:

unsigned int signature = '????';  /* Should be value 0x3F3F3F3F */

Trigraph processing would would turn that into:

unsigned int signature = '??^;  /* Should be value 0x3F3F3F3F */

which would of course not compile. In some slightly rarer cases, it would be possible for such processing to yield code which would compile, but with different meaning from what was intended, eg

char *template = "????/1234";

which would get turned into

char *template = "??S4"; // ??/ becomes \, and \123 becomes S

Not the string literal that was intended, but still perfectly legitimate nonetheless.

By contrast, digraphs are relatively benign because outside of some possible weird corner cases involving macros, no code containing processable digraphs would have a legitimate meaning in the absence of such processing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM