简体   繁体   English

需要帮助来理解C ++程序的编译

[英]Need help with understanding compilation of C++ programs

I don't properly understand compilation and linking of C++ programs. 我不太了解C ++程序的编译和链接。 Is there a way, I can look at object files generated by compiling a C++ program(in an understandable format). 有没有办法,我可以看看通过编译C ++程序(可理解的格式)生成的目标文件。 This should help me understand format of object files, how C++ classes are compiled, what information is needed by compiler to generate object files and help me understand statements like: 这应该可以帮助我理解目标文件的格式,如何编译C ++类,编译器需要什么信息来生成目标文件并帮助我理解如下语句:

if a class is used only as a input parameters and return type, we don't need to include the whole class header file. 如果一个类仅用作输入参数和返回类型,则不需要包括整个类头文件。 Forward declaration is enough, but if a derived class derives from base class, we need to include the file containing the definition of base class (Taken from "Exceptional C++"). 前向声明就足够了,但是如果派生类派生自基类,则需要包含包含基类定义的文件(取自“ Exceptional C ++”)。

I am reading the book "Linking and Loading" to understand format of object files, but I would prefer something specially tailored for C++ source code. 我正在阅读《链接和加载》一书,以了解目标文件的格式,但是我希望为C ++源代码专门定制一些内容。

Thanks, 谢谢,

Jagrati Jagrati

Edit: 编辑:

I know that with nm I can look at symbols present in the object files, but I am interested in knowing more about the object files. 我知道使用nm可以查看目标文件中存在的符号,但是我有兴趣进一步了解目标文件。

First things, first. 首先,首先。 Disassembling the compiler output will most probably not help you in any way to understand any of the issues you have. 拆卸编译器输出很可能不会以任何方式帮助您理解所遇到的任何问题。 The output of the compiler is no longer a c++ program, but plain assembly and that is really harsh to read if you do not know what the memory model is. 编译器的输出不再是c ++程序,而是简单的汇编,如果您不知道什么是内存模型,那么阅读起来就很困难。

On the particular issues of why is the definition of base required when you declare it to be a base class of derived there are a few different reasons (and probably more that I am forgetting): 关于为什么在将base声明定义为derived的基类时为什么需要定义base的特定问题,有几个不同的原因(也许我忘记了更多):

  1. When an object of type derived is created, the compiler must reserve memory for the full instance and all subclasses: it must know the size of base 创建derived类型的对象时,编译器必须为完整实例和所有子类保留内存:它必须知道base的大小
  2. When you access a member attribute the compiler must know the offset from the implicit this pointer, and that offset requires knowledge of the size taken by the base subobject. 当您访问成员属性时,编译器必须知道隐式this指针的偏移量,并且该偏移量要求了解base子对象占用的大小。
  3. When an identifier is parsed in the context of derived and the identifier is not found in derived class, the compiler must know whether it is defined in base before looking for the identifier in the enclosing namespaces. 当标识符的上下文中被解析derived和标识符未发现derived类,编译器必须知道它是否在定义的base寻找封闭命名空间标识符之前。 The compiler cannot know whether foo(); 编译器不知道是否foo(); is a valid call inside derived::function() if foo() is declared in the base class. 如果在base类中声明了foo()则它是derived::function()内部的有效调用。
  4. The number and signatures of all virtual functions defined in base must be known when the compiler defines the derived class. 当编译器定义derived类时,必须知道base定义的所有虚函数的编号和签名。 It needs that information to build the dynamic dispatch mechanism --usually vtable--, and even to know whether a member function in derived is bound for dynamic dispatch or not --if base::f() is virtual, then derived::f() will be virtual regardless of whether the declaration in derived has the virtual keyword. 它需要这些信息来构建动态调度机制(通常是vtable),甚至还需要知道derived的成员函数是否绑定了动态调度,如果base::f()是虚拟的,则需要derived::f()无论derived中的声明是否具有virtual关键字, derived::f()都是虚拟的。
  5. Multiple inheritance adds a few other requirements --like relative offsets from each baseX that must be rewritting before final overriders for the methods are called (a pointer of type base2 that points to an object of multiplyderived does not point to the beginning of the instance, but to the beginning of the base2 subobject in the instance, which might be offsetted by other bases declared before base2 in the inheritance list. 多重继承会增加其他一些要求,例如每个baseX相对偏移量,在调用这些方法的最终重写器之前必须重写这些偏移量( base2类型的指针指向multiplyderived对象的对象并不指向实例的开头,但要到达实例中base2子对象的开头,这可能会被继承列表中base2之前声明的其他碱基所base2

To the last question in the comments: 对评论中的最后一个问题:

So doesn't instantiation of objects (except for global ones) can wait until runtime and thus the size and offset etc could wait until link time and we shouldn't necessarily have to deal with it at the time we are generating object files? 因此,对象的实例化(全局对象除外)是否可以等到运行时才能使用,因此大小和偏移量等可以等到链接时才开始,并且在生成对象文件时我们不必一定要处理它吗?

void f() {
   derived d;
   //...
}

The previous code allocates and object of type derived in the stack. 前面的代码分配并在堆栈中derived类型的对象。 The compiler will add assembler instructions to reserve some amount of memory for the object in the stack. 编译器将添加汇编程序指令,以为堆栈中的对象保留一些内存。 After the compiler has parsed and generated the assembly, there is no trace of the object, in particular (assuming a trivial constructor for a POD type: ie nothing is initialized), that code and void f() { char array[ sizeof(derived) ]; } 编译器解析并生成程序集之后,就没有该对象的踪迹了,特别是(假设POD类型的构造函数很简单:即未初始化任何内容),该代码和void f() { char array[ sizeof(derived) ]; } void f() { char array[ sizeof(derived) ]; } will produce exactly the same assembler. void f() { char array[ sizeof(derived) ]; }将产生完全相同的汇编器。 When the compiler generates the instruction that will reserve the space, it needs to know how much. 当编译器生成保留空间的指令时,它需要知道多少。

Have you tried inspecting your binaries with readelf (provided you're on a Linux platform)? 您是否尝试过使用readelf检查二进制文件(前提是您使用的是Linux平台)? This provides pretty comprehensive information on ELF object files. 这提供了有关ELF对象文件的相当全面的信息。

Honestly, though, I'm not sure how much this would help with understanding compilation and linking. 不过,老实说,我不确定这对理解编译和链接有多大帮助。 I think the right tack is probably to get a handle on how C++ code maps to assembly pre- and post-linking. 我认为正确的方法可能是掌握C ++代码如何映射到程序集链接前和链接后。

You normally don't need to know in details the internal format of the Obj files, since they are generated for you. 您通常不需要详细了解Obj文件的内部格式,因为它们是为您生成的。 All you need to know is that for every class you create, the compiler generates and Obj file, which is the binary byte code of your class, suited for the OS you are compiling for. 您需要知道的是,对于您创建的每个类,编译器都会生成一个Obj文件,它是您的类的二进制字节代码,适合您要为其编译的OS。 Then the next step - linking - will put together the object files for all classes you need for your program in a single EXE or DLL (or whatever other format for the non-Windows OS-es). 然后,下一步-链接-将在单个EXE或DLL(或非Windows OS-es的任何其他格式)中将程序所需的所有类的目标文件放在一起。 Could be also EXE + several DLLs, depending on your wishes. 也可以是EXE +几个DLL,具体取决于您的意愿。

The most important is that you separate the interface (declaration) and implementation (definition) of your class. 最重要的是您将类的接口(声明)和实现(定义)分开。

Always put in the header file interface declarations of your class only. 始终仅将类的头文件接口声明放入。 Nothing else - no implementations here. 没什么-这里没有实现。 Avoid also member variables, with custom types, which are not pointers, because for them forward declarations are not enough and you need to include other headers in your header. 还要避免使用具有自定义类型的成员变量(它们不是指针),因为对于它们来说,前向声明还不够,并且您需要在标头中包含其他标头。 If you have includes in your header, then the design smells and also slows down the building process. 如果标题中包含,则设计会散发出气味,并减慢构建过程。

All implementations of the class methods or other functions should be in the CPP file. 类方法或其他功能的所有实现应在CPP文件中。 This will guarantee that the Obj file, generated by the compiler, won't be needed when somebody includes your header and you can have includes from others in the CPP files only. 这样可以确保当有人包含您的标头并且您只能在CPP文件中包含他人的标头时,就不需要编译器生成的Obj文件。

But why bother? 但是为什么要打扰呢? The answer is that if you have such separations, then the Linking is faster, because each of your Obj files is used once per class. 答案是,如果有这样的分隔符,那么链接会更快,因为每个类的每个Obj文件都使用一次。 Also, if you change your class, this will change also a small amount of other object files during the next build. 另外,如果您更改类,那么在下一次构建期间,这还将更改少量其他对象文件。

If you have includes in the header, this means that when the compiler generates the Obj file for your class it should first generate Obj file for the other classes included in your header, which may require again other Obj files and so on. 如果标头中包含,则意味着编译器为您的类生成Obj文件时,应首先为标头中包含的其他类生成Obj文件,这可能再次需要其他Obj文件,依此类推。 Could be even a circular dependency and then you can not compile! 甚至可能是循环依赖项,然后就无法编译! Or if you change something in your class, then the compiler will need to regenerate a lot of other Obj files, because they become very tight dependent after some time, if you don't separate. 或者,如果您在类中进行了某些更改,则编译器将需要重新生成许多其他Obj文件,因为如果不分开,它们在一段时间后会变得非常紧密地依赖。

nm is a unix tool which will show you the names of the symbols in an object file. nm是一个Unix工具,它将为您显示目标文件中符号的名称。

objdump is a GNU tool which will show you more information. objdump是一个GNU工具,它将向您显示更多信息。

But both tools will show you quite raw information that is used by the linker, but not designed to be read by human beings. 但是,这两种工具都将向您显示链接器使用的原始信息,而并非旨在供人类阅读。 That will probably not help you to better understand what happen at the C++ level. 那可能不会帮助您更好地了解C ++级别的情况。

Im reading " http://www.network-theory.co.uk/docs/gccintro/ " - "Introduction to GCC". 我正在阅读“ http://www.network-theory.co.uk/docs/gccintro/”- “ GCC简介”。 This has given me a good insight in linking and compiling. 这使我对链接和编译有了很好的了解。 Its on a beginners level, but I dont care. 它在初学者的水平,但我不在乎。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM