简体   繁体   English

需要帮助了解C代码

[英]Need help understanding C code

This question references Reflections on Trusting Trust , figure 2. 该问题参考了图2的“ 信任信任的思考”

Take a look at this snippet of code, from figure 2: 看看这段代码,如图2所示:

...
c = next( );
if(c != '\\')
    return(c);
c = next( );
if (c != '\\')
    return('\\');
if (c == 'n')
    return('\n');

It says: 它说:

This is an amazing piece of code. 这是一段惊人的代码。 It "knows" in a completely portable way what character code is compiled for a new line in any character set. 它以完全可移植的方式“知道”为任何字符集中的新行编译的字符代码。 The act of knowing then allows it to recompile itself, thus perpetuating the knowledge. 知道的行为然后允许它重新编译自己,从而使知识永久化。

I would like to read the rest of the paper. 我想阅读本文的其余部分。 Can someone explain how the above code is recompiling itself? 有人可以解释上面的代码是如何重新编译的吗? I'm not sure I understand how this snippet of code relates to the code in "Stage 1": 我不确定我是否理解这段代码与“第1阶段”中的代码有何关联:

Stage 1 http://cm.bell-labs.com/who/ken/fig1.gif 第1阶段http://cm.bell-labs.com/who/ken/fig1.gif

The stage 2 example is very interesting because it is an extra level of indirection with a self replicating program. 第2阶段的例子非常有趣,因为它是一个额外的间接级别,具有自我复制程序。

What he means is that since this compiler code is written in C it is completely portable because it detects the presence of a literal \\n and returns the character code for \\n without ever knowing what that actual character code is since the compiler was written in C and compiled for the system. 他的意思是,由于这个编译器代码是用C语言编写的,所以它是完全可移植的,因为它检测到文字\\ n的存在并返回\\ n的字符代码而不知道自编译器编写以来实际的字符代码是什么C并为系统编译。

The paper goes on to show you very interesting trojan horse with the compiler. 本文继续向您展示使用编译器的非常有趣的特洛伊木马。 If you use this same technique to make the compiler insert a bug into any program, then remove move the bug from the source code, the compiler will compile the bug into the supposedly bug free compiler. 如果您使用相同的技术使编译器将错误插入任何程序,然后从源代码中删除该错误,则编译器会将错误编译为所谓的无错误编译器。

It is a bit confusing but essentially it is about multiple levels of indirection. 这有点令人困惑,但是从本质上讲它涉及多个间接级别。

What this piece of code does is to translate escape characters, which is part of the job of a C compiler. 这段代码所做的是翻译转义字符,这是C编译器工作的一部分。

c = next( );
if(c != '\\')
    return(c);

Here, if c is not \\\\ (the character \\ ), means it's not the start of an escape character, so return itself. 这里,如果c不是\\\\ (字符\\ ),意味着它不是转义字符的开头,所以返回它自己。

If it is, then it's the start of an escape character. 如果是,则它是转义字符的开头。

c = next( );
if (c == '\\')
    return('\\');
if (c == 'n')
    return('\n');

Here you have a typo in your question, it's if (c == '\\\\') , not if (c != '\\\\') . 在这里,您的问题有错别字,是if (c == '\\\\')而不是if (c != '\\\\') This piece of code continue to examine the character following \\ , it's clear, if it's \\ , then the whole escape character is \\\\ , so return it. 这段代码继续检查\\的字符,很明显,如果它是\\ ,则整个转义字符为\\\\ ,因此将其返回。 The same for \\n . 同样的\\n

The description of that code, from Ken Thompson's paper is: (emphasis added) Ken Thompson的论文中对该代码的描述是:(强调添加)

Figure 2 is an idealization of the code in the C compiler that interprets the character escape sequence. 图2是C编译器中解释字符转义序列的代码的理想化。

So you're looking at part of a C compiler. 所以你正在寻找C编译器的一部分。 The C compiler is written in C, so it will be used to compile itself (or, more accurately, the next version of itself). C编译器是用C语言编写的,因此它将用于编译自身(或者更准确地说,它是自身的下一个版本)。 Hence the statement that the code is able to "recompile itself". 因此声明代码能够“重新编译自己”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM