简体   繁体   English

为什么链接器不能阻止C ++静态初始化命令惨败?

[英]Why can't the linker prevent the C++ static initialization order fiasco?

EDIT: Changed example below to one that actually demonstrates the SIOF. 编辑:将下面的示例更改为实际演示SIOF的示例。

I am trying to understand all of the subtleties of this problem, because it seems to me to be a major hole in the language. 我试图理解这个问题的所有细微之处,因为在我看来它是语言中的一个主要漏洞。 I have read that it cannot be prevented by the linker, but why is this so? 我已经读过链接器无法阻止它,但为什么会这样呢? It seems trivial to prevent in simple cases, like this: 在简单的情况下,防止这样做似乎微不足道:

// A.h
extern int x;

// A.cpp
#include <cstdlib>

int x = rand();

// B.cpp
#include "A.h"
#include <iostream>

int y = x;

int main()
{
    std::cout << y; // prints the random value (or garbage)?
}

Here, the linker should be able to easily determine that the initialization code for A.cpp should happen before B.cpp in the linked executable, because B.cpp depends on a symbol defined in A.cpp (and the linker obviously already has to resolve this reference). 在这里,链接器应该能够容易地确定A.cpp的初始化代码应该在链接可执行文件中的B.cpp之前发生,因为B.cpp依赖于在A.cpp中定义的符号(并且链接器显然已经必须解决这个问题)。

So why can't this be generalized to all compilation units. 那么为什么不能将它推广到所有编译单元。 If the linker detects a circular dependency, can't it just fail the link with an error (or perhaps a warning, since it may be the programmer's intent I suppose to define a global symbol in one compilation unit, and initialize it in another)? 如果链接器检测到循环依赖,那么它不能使链接失败(或者可能是警告,因为它可能是程序员的意图,我想在一个编译单元中定义一个全局符号,并在另一个编译单元中初始化它) ?

Does the standard levy any requirements on an implementation to ensure the proper initialization order in simple cases? 标准是否对实现有任何要求以确保在简单情况下正确的初始化顺序? What is an example of a case where this would not be possible? 什么是不可能的情况的例子?

I understand that an analogous situation can occur at global destruction time. 我知道在全球销毁时可能发生类似的情况。 If the programmer does not carefully ensure that the dependencies during destruction are symmetrical to construction, a similar problem occurs. 如果程序员没有仔细确保破坏期间的依赖关系与构造对称,则会出现类似的问题。 Could the linker not warn about this scenario as well? 链接器是否也不会对这种情况发出警告?

Linkers traditionally just link - ie they resolve addresses. 传统上,链接器只是链接 - 即它们解析地址。 You seem to be wanting them to do semantic analysis of the code. 您似乎希望他们对代码进行语义分析。 But they don't have access to semantic information - only a bunch of object code. 但是他们无法访问语义信息 - 只有一堆目标代码。 Modern linkers at least can handle large symbol names and discard duplicate symbols to make templates more useable, but so long as linkers and compilers are independent, that's about it. 现代链接器至少可以处理大符号名称并丢弃重复符号以使模板更有用,但只要链接器和编译器是独立的,就是这样。 Of course if both linker and compiler are developed by the same team, and if that team is a big corporation, more intelligence can be put in the linker, but it's hard to see how a standard for a portable language can mandate such a thing. 当然,如果链接器和编译器都是由同一个团队开发的,并且如果该团队是一个大公司,那么可以在链接器中添加更多的智能,但是很难看出可移植语言的标准如何能够强制执行这样的事情。

If you want to know more about linkers, BTW, take a look at http://www.iecc.com/linker/ - about the only book on an often ignored tool. 如果你想了解更多关于链接器的信息,请看看http://www.iecc.com/linker/ - 关于经常被忽略的工具的唯一一本书。

In theory, there's nothing preventing a linker from handling this -- basically do a topological sort among the dependencies to come up with an initialization order. 理论上,没有什么能阻止链接器处理这个 - 基本上在依赖关系中进行拓扑排序以提出初始化顺序。 Existing linkers don't do it though, and C++ mostly depends on existing linkers... 现有的链接器不会这样做,而C ++主要依赖于现有的链接器......

Edit: From the viewpoint of the standard, the solution to this problem is utterly trivial: one sentence to require that all objects with static storage duration are initialized prior to main() beginning execution. 编辑:从标准的角度来看,这个问题的解决方案是完全无关紧要的:一句要求所有具有静态存储持续时间的对象在main()开始执行之前被初始化。 Unfortunately, about all that would accomplish is raising another area in which virtually nobody conforms with the standard, or (worse) even has a plan to do so. 不幸的是,关于所有可能实现的目标是提出另一个几乎没有人符合标准的领域,或者(更糟)甚至有计划这样做。 For it to mean anything, the implementers on the committee have to agree that it's sufficiently important that they're going to implement it. 对于它意味着什么,委员会的实施者必须同意,他们将要实施它是非常重要的。

You're right that it's easy to look around and see that people have problems with this. 你是对的,很容易环顾四周,看到人们有这个问题。 At the same time, I don't know of a single vendor who seems to consider it a real problem. 与此同时,我不知道单个供应商似乎认为这是一个真正的问题。 None of them seems to have worked on it yet. 他们似乎都没有参与其中。 None of them has it scheduled for a future release. 他们都没有安排在未来发布。 As far as I can see, it hasn't even made it onto anybody's "it would be nice if we could someday" list. 据我所知,它甚至还没有发布到任何人的“如果我们有一天可能会很好”的名单。

That brings us back to what I originally said: even though it may look like a serious problem to us as users, it apparently doesn't look that way to most implementers. 这让我们回到了我最初所说的内容:即使它对我们来说可能看起来像一个严重的问题,但对于大多数实施者来说 ,它显然看起来并不那样。 I can see a number of reasons that might be so. 我可以看到许多原因可能如此。 First, of course, is that C++ isn't a key item in anybody's corporate agenda. 首先,当然,C ++不是任何人的企业议程中的关键项目。 Microsoft pushes .NET. 微软推动.NET。 Sun/Oracle and IBM push Java. Sun / Oracle和IBM推动Java。 Others have their own agendas, but none of them is trying to get you to use C++. 其他人有自己的议程,但他们都没有试图让你使用C ++。 It looks to me like most of them consider it a necessary evil, not something to which they really want to devote any effort at all. 在我看来,他们中的大多数人认为这是一种必要的邪恶,而不是他们真正想要付出任何努力的东西。 That being the case, working on completely re-designing the guts of their linker to handle this particular problem would probably only even be open to consideration if they got a lot of complaints about it. 在这种情况下,如果他们得到很多关于它的投诉,那么完全重新设计他们的链接器的内容以处理这个特定问题的工作甚至可能只是开放考虑。 That as two problems. 那是两个问题。 First of all, C++ starts out as a fairly small community, so it would take a huge percentage of them before implementers really noticed anything they said. 首先,C ++最初是一个相当小的社区,因此在实施者真正注意到他们所说的任何内容之前,它需要占很大比例。 Second, only a fairly small percentage of C++ programmers really run into problems with this anyway. 其次,只有相当小比例的C ++程序员无论如何都会遇到问题。 About the only reason they'd bother or care would be if it became an issue for their own, internal development. 关于他们打扰或关心的唯一原因是,如果它成为他们自己的内部发展的问题。 Unfortunately, most have little reason to care about portability. 不幸的是,大多数人没有理由关心可移植性。

It's because static initialization is a completely different animal than runtime initialization. 这是因为静态初始化与运行时初始化完全不同。 The initialization of x is—by its nature in your example—dynamic. x的初始化在本例中是动态的。 But it is written as a static initialization. 但它被写为静态初始化。 This comes mostly from compatibility with decades of C practice. 这主要来自与数十年C练习的兼容性。

One way of resolving such a construct is to compiling initialization code for each module which runs before main(), like #pragma startup does in some implementations. 解析这种构造的一种方法是为main()之前运行的每个模块编译初始化代码,就像#pragma startup在某些实现中一样。

But really, how often does the declaration module not know what the initialization values are? 但实际上,声明模块多久不知道初始化值是什么?

In your simple example, a sufficiently smart linker could indeed work out that the initializations in Ao need to run before those in Bo because Bo refers to symbols that are defined in Ao 在您的简单示例中,足够智能的链接器确实可以确定Ao中的初始化需要在Bo中的初始化之前运行,因为Bo引用在Ao中定义的符号

But examples as simple as yours don't really demonstrate much of a problem, certainly not something of the "fiasco" level. 但是像你这样简单的例子并没有真正证明一个问题,当然也不是“惨败”的水平。 Here's a slightly more complicated example. 这是一个稍微复杂的例子。

// externs.h
extern int a;
extern int b;

// A.cpp
#include "externs.h"

int a = 5;
int aa = b;

// B.cpp
#include "externs.h"
int b = 10;
int bb = a;

The standard requires that variables in a single compilation unit be initialized in declaration order, so a must be initialized before aa , and b be initialized before bb , but there aren't any further ordering requirements. 该标准规定,在一个单一的编译单元变量声明的顺序进行初始化,所以a前必须初始化aa ,和b之前初始化bb ,但目前还没有任何进一步的排序要求。 Initializations from a compilation unit are allowed to be interleaved with those from other compilation units. 允许来自编译单元的初始化与来自其他编译单元的初始化交错。

There is at least one initialization order that would ensure all variables are initialized before they get used to initialize anything else, while still obeying the standard: 至少有一个初始化顺序可以确保所有变量在用于初始化其他任何内容之前进行初始化,同时仍遵守标准:

  1. a
  2. b
  3. bb
  4. aa

The linker has only limited information about this program. 链接器仅包含有关此程序的有限信息。 It knows that the compiled file Ao defines two symbols, a and aa , and that it refers to an external symbol b . 它知道编译的文件Ao定义了两个符号aaa ,并且它引用了外部符号b Likewise, it knows that Bo defines b and bb and refers to external symbol a . 同样,它知道Bo定义bbb并且引用外部符号a The two object files are mutually dependent, so the linker cannot use the same technique it could have used from your example. 这两个目标文件是相互依赖的,因此链接器不能使用它在您的示例中使用的相同技术。 In this example, it needs to know that only a has to be defined in order to initialize Bo The information recorded in the object files, though, doesn't get that specific. 在这个例子中,它需要知道只有a必须被定义才能初始化Bo但是,目标文件中记录的信息并没有得到具体的信息。 It doesn't contain dependencies between symbols. 它不包含符号之间的依赖关系。

传统链接器不查看源代码甚至AST,现有的目标文件格式提供有关导出和外部符号的相当少的信息。

While the linker could perhaps do that, most examples of where you would need it to do so are also examples of bad code lacking cohesion and having high coupling (usually through the horror of global variables). 虽然链接器也许可以这样做,但是大多数需要它的例子也是缺乏内聚和高耦合的坏代码的例子(通常是通过全局变量的恐怖)。 Your example being such an exemplar. 你的例子就是这样一个范例。

So it is hardly a "fiasco"; 所以它几乎不是“惨败”; that is probably too strong a description. 这可能是一个太强烈的描述。 It is merely a minor restriction of the way you might code. 它只是对您编码方式的一个小限制。

Any language standard is a compromise among many things. 任何语言标准都是许多事情之间的妥协。 In this case, we're talking about a compromise between ease of implementation and ease of use. 在这种情况下,我们谈论的是易于实现和易用性之间的折衷。 If a language is too hard to implement, there will be few or no conforming implementations, and the standard will be useless. 如果一种语言难以实现,那么很少或没有符合要求的实现,并且该标准将是无用的。 If it's too hard to use, nobody will use it, and the standard also will be useless. 如果它太难使用,没有人会使用它,标准也将毫无用处。

Language standard committees will therefore try to limit the demands they place on the implementation, particularly on the more common systems. 因此,语言标准委员会将试图限制它们对实施的要求,特别是在更常见的系统上。 In modern systems, it's very common to have various different compilers but a shared linker, and therefore a committee will feel much freer to make demands on the compiler writers but go easier on the linkers. 在现代系统中,拥有各种不同的编译器但共享链接器是很常见的,因此委员会对编译器编写者提出要求会更自由,但对链接器更容易。

C++ function overloading depended on finding a trick to make it work on linkers ("name mangling"). C ++函数重载取决于找到一个技巧,使其适用于连接器(“名称重整”)。 The C90 standard said that variable names with external linkage had to be unique in the first six characters without counting different cases. C90标准表示具有外部链接的变量名称必须在前六个字符中是唯一的,而不计算不同的情况。 The rationale (to the 1989 ANSI version, it was, IIRC, dropped for the 1990 ISO standard) said that the committee was very unhappy about keeping that restriction, but felt that dropping it would make it too difficult to implement standard C on too many systems with primitive linkers. 理由(对于1989年的ANSI版本,它是,IIRC,1990年ISO标准下降)说,委员会对保持这种限制非常不满意,但认为放弃它会使得太多地实施标准C太难了具有原始连接子的系统。

There is something of a chicken-and-egg situation here, in that language designers are reluctant to put demands on linkers, and therefore there's no great push for linkers to evolve, but that's the way things are currently working. 这里有鸡蛋和鸡蛋的情况,因为语言设计师不愿意对链接器提出要求,因此没有很大的推动链接器的发展,但这就是目前的工作方式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM