简体繁体 English

像C＃和Java这样的语言如何避免C / C ++ - 就像独立编译一样？

[英]How do languages like C# and Java avoid C/C++-like independent compilation?

原文 2009-03-28 20:23:33 8 6 c#/ java/ compiler-construction/ programming-languages

For my programming languages class, I'm writing a research paper on some papers by some important people in the history of language design. 对于我的编程语言课程，我正在撰写一篇关于语言设计史上一些重要人物的研究论文。 One by CAR Hoare struck me as odd because it speaks against independent compilation techniques used in C and later C++ before C even became popular. CAR Hoare的一个人让我感到奇怪，因为它反对在C甚至C版开始流行之前在C和后来的C ++中使用的独立编译技术。

Since this is primarily an optimization to speed up compilation times, what is it about Java and C# that make them able to avoid reliance on independent compilation? 由于这主要是加速编译时间的优化，因此Java和C＃能够避免依赖独立编译的原因是什么？ Is it a compiler technique or are there elements of the language that facilitate this? 它是一种编译器技术还是有语言元素可以促进这一点？ And are there any other compiled languages that used these techniques before them? 是否还有其他编译语言在他们之前使用这些技术？

6 个解决方案

Short answer: Java and C# don't avoid separate compilation; 简短回答：Java和C＃不避免单独编译; they make full use of it. 他们充分利用它。

Where they differ is that they don't require the programmer to write a pair of separate header/implementation files when writing a reusable library. 它们的不同之处在于，在编写可重用库时，它们不需要程序员编写一对单独的头文件/实现文件。 The user writes the definition of a class once, and the compiler extracts the information equivalent to the "header" from that single definition and includes it in the output file as "type metadata". 用户一次编写类的定义，编译器从该单个定义中提取等同于“标题”的信息，并将其作为“类型元数据”包含在输出文件中。 So the output file (a .jar full of .class files in Java, or an .dll assembly in .NET-based languages) is a combination of binaries AND headers in a single package. 因此输出文件（Java中的.class文件。或基于.NET的语言中的.dll程序集.jar ）是单个包中二进制文件和标题的组合。

Then when another class is compiled and it depends on the first class, it can look at the metadata instead of having to find a separate include file. 然后，当编译另一个类并且它依赖于第一个类时，它可以查看元数据而不必查找单独的包含文件。

It happens that they target a virtual machine rather than a specific chip architecture, but that's a separate issue; 碰巧他们的目标是虚拟机而不是特定的芯片架构，但这是一个单独的问题; they could put x86 machine code in as the binary and still have the header-like metadata in the same file as well (this is in fact an option in .NET, albeit rarely used). 他们可以将x86机器代码作为二进制文件放入，并且在同一个文件中仍然具有类似标题的元数据（这实际上是.NET中的一个选项，虽然很少使用）。

In C++ compilers it is common to try to speed up compilation by using "pre-compiled headers". 在C ++编译器中，通常使用“预编译头”来加速编译。 The metadata in .NET .dll and .class files is much like a pre-compiled header - already parsed and indexed, ready for rapid look-ups. .NET .dll和.class文件中的元数据非常类似于预编译的标头 - 已经过解析和索引，可以快速查找。

The upshot is that in these modern languages, there is one way of doing modularization, and it has the characteristics of a perfectly organised and hand-optimised C++ modular build system - pretty nifty, speaking ASFAC++B . 结果是，在这些现代语言中，有一种模块化的方式，它具有完美组织和手动优化的C ++模块化构建系统的特征 - 非常漂亮，说ASFAC ++ B.

IMO, one of the biggest factors here is that both java and .NET use intermediate languages; IMO，这里最大的因素之一是java和.NET都使用中间语言; that means that the compiled unit (jar/assembly) contains, as a pre-requisite, a lot of expressive metadata about the types, methods, etc; 这意味着编译的单元（jar / assembly）作为先决条件包含许多关于类型，方法等的表达元数据; meaning that it is already laid out conveniently for reference checking. 这意味着它已经方便地进行了参考检查。 The runtime still checks anyway, in case you are pulling a fast one ;-p 无论如何，运行时仍然会检查，以防你拉快速;-p

This isn't very far removed from the MIDL that underpins COM, although there the TLB is often a separate entity. 虽然TLB通常是一个单独的实体，但它与支撑COM的MIDL相距甚远。

If I've misunderstood your meaning, please let me know... 如果我误解了你的意思，请告诉我......

You could consider a java .class file to be similar to a precompiled header file in C/C++. 您可以将java .class文件视为与C / C ++中的预编译头文件类似。 Essentially the .class file is the intermediate form that a C/C++ linker would need as well as all of the information contained in the header (Java just doesn't have a separate header). 本质上，.class文件是C / C ++链接器所需的中间形式以及标头中包含的所有信息（Java只是没有单独的标头）。

Form your comment in another post: 在另一篇文章中形成您的评论：

"I'm basically meaning the idea in C/C++ that each source file is its own individual compilation unit. This doesn't as much seem to be the case in C# or Java." “在C / C ++中，我基本上都认为每个源文件都是它自己独立的编译单元。这在C＃或Java中似乎并非如此。”

In Java (I cannot speak for C#, but I assume it is the same) each source file is its own individual compilation unit. 在Java中（我不能代表C＃，但我认为它是相同的）每个源文件都是它自己的单独编译单元。 I am not sure why you would think it is not... perhaps we have different definitions of compilation unit? 我不知道为什么你会认为它不是......也许我们对编译单元有不同的定义？

It requires some language support (otherwise, C/C++ compilers would do it too) 它需要一些语言支持（否则，C / C ++编译器也会这样做）

In particular, it requires that the compiler generates self-contained modules, which expose metadata that other modules can reference to call into them. 特别是，它要求编译器生成自包含的模块，这些模块会公开其他模块可以引用的元数据。

.NET assemblies are a straightforward example. .NET程序集是一个简单的例子。 All the files in a project are compiled together, generating one dll. 项目中的所有文件都编译在一起，生成一个dll。 This dll can be queried by .NET to determine which types it contains, so that other assemblies can call functions defined in it. .NET可以查询此dll以确定它包含哪些类型，以便其他程序集可以调用其中定义的函数。

And to make use of this, it must be legal in the language to reference other modules. 要使用它，在语言中引用其他模块必须是合法的。

In C++, what defines the boundary of a module? 在C ++中，什么定义了模块的边界？ The language specifies that the compiler only considers data in its current compilation unit (.cpp file + included headers). 该语言指定编译器仅考虑其当前编译单元中的数据（.cpp文件+包含的头文件）。 There is no mechanism for specifying "I'd like to call function Foo in module Bar, even though I don't have the prototype or anything for it at compile-time". 没有指定“我想在模块Bar中调用函数Foo的机制，即使我在编译时没有原型或任何东西”。 The only mechanism you have for sharing type information between files is with #includes. 您在文件之间共享类型信息的唯一机制是使用#includes。

There is a proposal to add a module system to C++, but it won't be in C++0x. 有人建议将模块系统添加到C ++中，但它不会在C ++ 0x中。 Last I saw, the plan was to consider it for a TR1 after 0x is out. 最后我看到，计划是在0x出局后将其视为TR1。

(It's worth mentioning that the #include system in C/C++ was originally used because it'd speed up compilation. Back in the 70's, it allowed the compiler to process the code in a simple linear scan. It didn't have to build syntax trees or other such "advanced" features. Today, the tables have turned and it's become a huge bottleneck, both in terms of usability and compilation speed.) （值得一提的是，最初使用C / C ++中的#include系统是因为它加速了编译。早在70年代，它就允许编译器在简单的线性扫描中处理代码。它不需要构建语法树或其他类似的“高级”功能。今天，表已经转变，它在可用性和编译速度方面都成为一个巨大的瓶颈。）

由C / C ++生成的目标文件只能由链接器读取，而不能由编译器读取。

As to other languages: IIRC Turbo Pascal had "units" which you could use without having any source code. 至于其他语言：IIRC Turbo Pascal有“单位”，您可以使用它而无需任何源代码。 I think the point is to create metadata along with compiled code which can then be used by the compiler to figure out the interface to the module (ie signatures of functions, class layout etc.) 我认为重点是创建元数据以及编译后的代码，然后编译器可以使用这些代码来确定模块的接口（即函数的签名，类布局等）

One problem with C/C++ which prevents just replacing #include with some kind of #import is also the preprocessor, which can completely change the meaning/syntax etc of included/imported modules. C / C ++的一个问题是防止用某种#import替换#include也是预处理器，它可以完全改变包含/导入模块的含义/语法等。 This would be very difficult (if not impossible) with a Java-like module system. 使用类似Java的模块系统将非常困难（如果不是不可能的话）。