简体   繁体   English

二合字母和三合字母不能一起工作?

[英]Digraph and trigraph can't work together?

I'm learning digraph and trigraph, and here is the code which I cannot understand.我正在学习二合字母和三合字母,这是我无法理解的代码。 (Yes, I admit that it's extremely ugly.) (是的,我承认它非常丑陋。)

This code can compile:这段代码可以编译:

#define _(s) s%:%:s

main(_(_))
<%
    __;
%>t

This code can compile, too:这段代码也可以编译:

#define _(s) s??=??=s

main(_(_))
<%
    __;
%>

However, neither of the following two pieces of code can compile:但是,以下两段代码都无法编译:

#define _(s) s%:??=s

main(_(_))
<%
    __;
%>

And

#define _(s) s??=%:s

main(_(_))
<%
    __;
%>

This does confuse me: Since the first two pieces of code can compile, I suppose the expansion of digraph and trigraph both take place before the macro expansion.这确实让我感到困惑:由于前两段代码可以编译,我认为 digraph 和 trigraph 的扩展都发生在宏扩展之前。 So why can't it compile when digraph and trigraph are used together?那么为什么 digraph 和 trigraph 一起使用时不能编译呢?

Digraphs and trigraphs are totally different.二合字母和三合字母完全不同。 Trigraphs are replaced during phase 1 of translation, [see Note 1] which is before the source code has been separated into tokens.三合字母在翻译的第 1 阶段被替换,[见注 1],这是在源代码被分离成标记之前。 Digraphs are tokens which are alternate spellings for other tokens, so they are not meaningful until after the source has been separated into tokens.有向图是标记,它是其他标记的替代拼写,因此在将源分成标记之后它们才有意义。 (The word "digraph" is not very accurate; it is used because it resembles "trigraph", but the set of digraphs includes %:%: which consists of four characters.) (“digraph”这个词不是很准确;使用它是因为它类似于“trigraph”,但有向图集包括%:%:它由四个字符组成。)

So ??= is replaced with a # before any token analysis is done.因此,在完成任何标记分析之前, ??=被替换为# But %: is just a token, with the same meaning as # .但是%:只是一个标记,与#含义相同。

Moreover, %:%: is a token with the same meaning as ## .此外, %:%:是一个与##含义相同的标记。 But %:# is two tokens ( %: and # ), which is not legal since the stringify operator (whether spelled %: or # ) can only be followed by a macro parameter.但是%:#是两个标记( %:# ),这是不合法的,因为 stringify 运算符(无论拼写为%:还是# )只能跟一个宏参数。 [See Note 2] And it does not become any less illegal if the # were the result of a trigraph substitution. [见注 2] 如果#是三字母替换的结果,它不会变得更不违法。

One important difference between digraphs and trigraphs, as illustrated by the hilarious snippet in chqrlie's answer , is that trigraphs also work in strings.二合字母和三合字母之间的一个重要区别,如chqrlie's answer 中的搞笑片段所示,是三合字母也适用于字符串。 Digraphs allow you to write C code even if your keyboard lacks brackets and octothorpi, but they don't help you print those characters out.即使您的键盘缺少括号和八字连字,二合字母也允许您编写 C 代码,但它们不能帮助您打印出这些字符。


Notes (Standards quotes):备注(标准报价):

  1. §5.1.1.2, Translation phases , paragraph 1: §5.1.1.2,翻译阶段,第 1 段:

    The precedence among the syntax rules of translation is specified by the following phases.翻译的语法规则之间的优先级由以下阶段指定。

    1. Physical source file multibyte characters are mapped, in an implementation-defined manner, to the source character set (introducing new-line characters for end-of-line indicators) if necessary.如有必要,物理源文件多字节字符以实​​现定义的方式映射到源字符集(为行尾指示符引入换行符)。 Trigraph sequences are replaced by corresponding single-character internal representations.三字符序列被相应的单字符内部表示替换。
  2. §6.10.3.2, The # operator , paragraph 1:第 6.10.3.2 节, # 运算符,第 1 段:

    Each # preprocessing token in the replacement list for a function-like macro shall be followed by a parameter as the next preprocessing token in the replacement list.类似函数的宏的替换列表中的每个预处理标记应后跟一个参数,作为替换列表中的下一个预处理标记。

For the academic side, look at rici's well documented answer.对于学术方面,请查看 rici 有据可查的答案。

For the common sense side, unless you are already quite proficient in C, digraphs and trigraphs are completely useless, and you should not even waste any time on the subject.对于常识方面,除非你已经相当精通 C,否则二合字母和三合字母完全没用,你甚至不应该在这个主题上浪费任何时间。 They were invented as a way to support non-US 7-bit characters sets that were still used in the 1980s on mainframes and some minicomputers.它们的发明是为了支持 1980 年代在大型机和一些小型计算机上仍然使用的非美国 7 位字符集。 These character sets lacked some of the punctuation needed for the C language, such as # , { , } etc. to make space for locale specific characters such as ç , é , è ... (pardon my French).这些字符集缺少 C 语言所需的一些标点符号,例如#{}等,以便为特定于语言环境的字符(例如çéè ...(原谅我的法语)腾出空间)。

Even on these systems, which I used for a long while, trigraphs were never used, because ugly pragmatic alternatives existed: on French systems, accented letters such as é and è were typed but would be interpreted by the C compiler as { and } .即使在我使用了很长时间的这些系统上,也从未使用过三合字母,因为存在丑陋的实用替代方案:在法语系统上,输入重音字母,如éè ,但会被 C 编译器解释为{} It made C programming obscure and pushed many programmers to switch to a US QWERTY keyboard and Locale (or equivalent).它使 C 编程变得晦涩难懂,并促使许多程序员改用美国 QWERTY 键盘和语言环境(或等效语言)。

This is a thing of the past, only of historical interest and you will never see these in action, aside from typos, obfuscation and obnoxious interview questions.这是过去的事情,只是为了历史兴趣,除了错别字、混淆和令人讨厌的面试问题之外,你永远不会看到这些在行动。

Regarding the latter, I cannot resist posting this one:关于后者,我无法抗拒发布这个:

I cannot get fnmatch to validate my date template even if I force a valid date, what is wrong with this code:即使我强制使用有效日期,我也无法使用fnmatch来验证我的日期模板,这段代码有什么问题:

#include <stdio.h>
#include <fnmatch.h>
int main() {
    char date[] = "01/01/1988";
    if (fnmatch("??/??/????", date, 0))
        printf("invalid date format\n");
    return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM