简体   繁体   English

引导跨平台编译器

[英]Bootstrapping a cross-platform compiler

Suppose you are designing, and writing a compiler for, a new language called Foo, among whose virtues is intended to be that it's particularly good for implementing compilers. 假设您正在为一种名为Foo的新语言设计和编写编译器,其优点之一是它对于实现编译器特别有用。 A classic approach is to write the first version of the compiler in C, and use that to write the second version in Foo, after which it becomes self-compiling. 一种经典的方法是在C中编写第一个版本的编译器,并使用它在Foo中编写第二个版本,之后它将自动编译。

This does mean you have to be careful to keep backup copies of the binary (as opposed to most programs where you only have to keep backup copies of the source); 这意味着您必须小心保留二进制文件的备份副本(而不是大多数只需保留源代码备份副本的程序); once the language has evolved away from the first version, if you lost all copies of the binary, you would have nothing capable of compiling the current version. 一旦语言从第一个版本演变而来,如果丢失了二进制文件的所有副本,则没有能力编译当前版本。 So be it. 就这样吧。

But suppose it is intended to support both Linux and Windows. 但是假设它旨在支持Linux和Windows。 As long as it is in fact running on both platforms, it can compile itself on each platform, no problem. 只要它实际上在两个平台上运行,它就可以在每个平台上自行编译,没问题。 Supposing however you lost the binary on one platform (or had reason to suspect it had been compromised by an attacker); 假设您在一个平台上丢失了二进制文件(或者有理由怀疑它已被攻击者攻陷); now there is a problem. 现在有一个问题。 And having to safeguard the binary for every supported platform is at least one more failure point than I'm comfortable with. 并且必须为每个支持的平台保护二进制文件至少还有一个比我更舒服的失败点。

One solution would be to make it a cross-compiler, such that the binary on either platform can target both platforms. 一种解决方案是使其成为交叉编译器,使得任一平台上的二进制文件都可以针对两个平台。

This is not quite as easy as it sounds - while there is no problem selecting the binary output format, each platform provides the system API in the form of C header files, which normally only exist on their native platform, eg there is no guarantee code compiled against the Windows stdio.h will work on Linux even if compiled into Linux binary format. 这并不像听起来那么简单 - 虽然选择二进制输出格式没有问题,但每个平台都以C头文件的形式提供系统API,这些文件通常只存在于其原生平台上,例如,没有保证代码针对Windows stdio.h编译即使编译成Linux二进制格式也可以在Linux上运行。

Perhaps that problem could be solved by downloading the Linux header files onto a Windows box and using the Windows binary to cross-compile a Linux binary. 也许这个问题可以通过将Linux头文件下载到Windows机器上并使用Windows二进制文件交叉编译Linux二进制文件来解决。

Are there any caveats with that solution I'm missing? 是否有任何关于我失踪的解决方案的警告?

Another solution might be to maintain a separate minimum bootstrap compiler in Python, that compiles Foo into portable C, accepting only that subset of the language needed by the main Foo compiler and performing minimum error checking and no optimization, the intent being that the bootstrap compiler will thus remain simple enough that maintaining it across subsequent language versions wouldn't cost very much. 另一个解决方案可能是在Python中维护一个单独的最小引导程序编译器,它将Foo编译为可移植C,只接受主Foo编译器所需的语言子集并执行最小错误检查而不执行优化,目的是引导程序编译器因此将保持简单,以便在后续语言版本中保持它不会花费太多。

Again, are there any caveats with that solution I'm missing? 再说一遍,有什么警告我不知道该解决方案吗?

What methods have people used to solve this problem in the past? 过去人们用什么方法来解决这个问题?

This is a problem for C compilers themselves. 这对C编译器本身来说是一个问题。 It's typically solved by the use of a cross-compiler, exactly as you suggest. 它通常通过使用交叉编译器来解决,正如您所建议的那样。

The process of cross-compiling a compiler is no more difficult than cross-compiling any other project: that is to say, it's trickier than you'd like, but by no means impossible. 交叉编译编译器的过程并不比交叉编译任何其他项目困难:也就是说,它比你想要的更棘手,但绝不是不可能的。

Of course, you first need the cross-compiler itself. 当然,您首先需要交叉编译器本身。 This probably means some major surgery to your build-configuration system, and you'll need some kind of "sysroot" taken from the target (header, libraries, anything else you'll need to reference in a build). 这可能意味着对构建配置系统进行了一些重大手术,并且您需要从目标中获取某种“sysroot”(标头,库,您需要在构建中引用的任何其他内容)。

So, in the end it depends on how your compiler is structured. 所以,最终它取决于编译器的结构。 Either it's easier to re-bootstrap using historical sources, repeating each phase of language compatibility you went through in the first place (you did use source revision control, right?), or it's easier to implement a cross-compiler configuration. 要么使用历史源重新引导更容易,重复首先要经历的语言兼容性的每个阶段(你确实使用了源代码修订控制,对吗?),或者更容易实现交叉编译器配置。 I can't tell you which from here. 我不能告诉你这是从哪里来的。

For many years, the GCC compiler was always written only in standard-compliant C code for exactly this reason: they wanted to be able to bring it up on any OS, given only the native C compiler for that system. 多年来,GCC编译器总是只在标准兼容的C代码中编写,正是出于这个原因:他们希望能够在任何操作系统上实现它,只给出该系统的本机C编译器。 Only in 2012 was it decided that C++ is now sufficiently widespread that the compiler itself can be written in it. 仅在2012年才决定C ++现在已经足够普及,编译器本身就可以用它来编写。 Even then, they're only permitting themselves a subset of the language. 即便如此,他们只允许自己成为语言的一个子集。 In future, if anybody wants to port GCC to a platform that does not already have C++, they will need to either use a cross-compiler, or first port GCC 4.7 (that last major C-only version) and then move to the latest. 将来,如果有人想将GCC移植到一个还没有C ++的平台上,他们将需要使用交叉编译器,或者第一个端口GCC 4.7(最后一个主要的C-only版本)然后移动到最新版本。

Additionally, the GCC build process does not "trust" the compiler it was built with. 此外,GCC构建过程不“信任”它构建的编译器。 When you type "make", it first builds a reduced version of itself, it then uses that the build a full version. 当您键入“make”时,它首先构建自身的简化版本,然后使用该构建完整版本。 Finally, it uses the full version to rebuild another full version, and compares the two binaries. 最后,它使用完整版本来重建另一个完整版本,并比较两个二进制文件。 If the two do not match it knows that the original compiler was buggy and introduced some bad code, and the build has failed. 如果两者不匹配,它知道原始编译器是错误的并且引入了一些错误的代码,并且构建失败了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM