简体   繁体   English

为什么g ++在动态链接时检测未定义的引用

[英]Why does g++ detect undefined reference when dynamically linking

I am probably mistaken about how dynamic linking works, because I cannot figure this out. 我可能错误地认为动态链接是如何工作的,因为我无法弄清楚这一点。 As I understood it, when a library is dynamically linked, its symbols are resolved at runtime. 据我所知,当动态链接库时,它的符号在运行时被解析。 From this answer: 这个答案:

When you link dynamically, a pointer to the file being linked in (the file name of the file, for example) is included in the executable and the contents of said file are not included at link time. 当您动态链接时,指向要链接的文件的指针(例如,文件的文件名)包含在可执行文件中,并且链接时不包括所述文件的内容。 It's only when you later run the executable that these dynamically linked files are bought in and they're only bought into the in-memory copy of the executable, not the one on disk. 只有当您稍后运行可执行文件时才会购买这些动态链接文件,并且它们只会被购买到可执行文件的内存中副本,而不是磁盘上的副本。

[...] [...]

In the dynamic case, the main program is linked with the C runtime import library (something which declares what's in the dynamic library but doesn't actually define it). 在动态情况下,主程序与C运行时导入库链接(某些东西声明了动态库中的内容但实际上没有定义它)。 This allows the linker to link even though the actual code is missing. 即使实际代码丢失,这也允许链接器链接。

Then, at runtime, the operating system loader does a late linking of the main program with the C runtime DLL (dynamic link library or shared library or other nomenclature). 然后,在运行时,操作系统加载程序执行主程序与C运行时DLL(动态链接库或共享库或其他命名法)的后期链接。

I am confused as to why g++ seems to expect the shared object to be there when dynamically linking against it. 我很困惑为什么g++似乎期望共享对象在动态链接时存在。 Sure, I would expect the name of the library to be necessary so that it can be loaded at runtime, but why is it the .so necessary at this stage? 当然,我希望库的名称是必要的,以便它可以在运行时加载,但为什么在这个阶段需要.so Furthermore, g++ complains about undefined references when linking against the library. 此外, g++在链接库时会抱怨未定义的引用。

My questions are: 我的问题是:

  1. Why does g++ seem to require the shared object when dynamically linking against it if the loading of the library only happens at runtime? 为什么g++似乎在动态链接时需要共享对象,如果只在运行时加载库? I understand how the -l flag could be necessary to specify the name of the shared object so that it can be loaded in runtime, but I see no point in having to provide the path to the .so at link time ( -L ) or the .so itself. 我理解如何指定共享对象的名称以便可以在运行时加载-l标志,但我认为必须在链接时( -L )提供.so的路径或.so本身。
  2. Why does g++ attempt to resolve the symbols when dynamically linking? 为什么g++在动态链接时会尝试解析符号? Nothing stops me from having a complete .so at link time but then providing a different (incomplete) .so at runtime, which causes the program to crash when it tries to use an undefined symbol. 没有什么能阻止我在链接时拥有一个完整的.so ,然后在运行时提供一个不同的(不完整的) .so ,这会导致程序在尝试使用未定义的符号时崩溃。

I made a reproducible example: 我做了一个可重复的例子:

Directory structure: 目录结构:

.
├── main.cpp
└── test
    ├── usertest.cpp
    └── usertest.h

File contents: 文件内容:

test/usertest.h 测试/ usertest.h

#ifndef USERTEST_H_4AD3C656_8109_11E8_BED5_5BE6E678B346
#define USERTEST_H_4AD3C656_8109_11E8_BED5_5BE6E678B346

namespace usertest
{
    void helloWorld();

    // This method is not defined anywhere
    void byeWorld();
};

#endif /* USERTEST_H_4AD3C656_8109_11E8_BED5_5BE6E678B346 */

test/usertest.cpp 测试/ usertest.cpp

#include "usertest.h"
#include <iostream>

void usertest::helloWorld()
{
    std::cout << "Hello, world\n";
}

main.cpp main.cpp中

#include "test/usertest.h"

int main()
{
    usertest::helloWorld();
    usertest::byeWorld();
}

Usage 用法

$ cd test
$ g++ -c -fPIC usertest.cpp
$ g++ usertest.o -shared -o libusertest.so
$ cd ..
$ g++ main.cpp -L test/ -lusertest
$ LD_LIBRARY_PATH="test" ./a.out

Expected behaviour 预期的行为

I would expect everything to crash when attempting to launch a.out because it cannot find the necessary symbols in libusertest.so . 我试图在尝试启动a.out时崩溃,因为它无法在libusertest.so找到必要的符号。

Actual behaviour 实际行为

The building of a.out fails at link time because it cannot find byeWorld() : a.out的构建在链接时失败,因为它无法找到byeWorld()

/tmp/ccVNcRRY.o: In function `main':
main.cpp:(.text+0xa): undefined reference to `usertest::byeWorld()'
collect2: error: ld returned 1 exit status

With the ELF format it indeed isn't necessary to know which symbols belong to which library, as the actual symbol resolution happens when the program is executed. 对于ELF格式,确实不必知道哪些符号属于哪个库,因为在执行程序时会发生实际的符号解析。 By convention though ld will still resolve the symbols when producing the binary. 按照惯例,虽然ld仍然会在生成二进制文件时解析符号。 It's for your convenience, so that you get immediate feedback when you have missing symbols, since in that case the chance is big your program won't work. 这是为了您的方便,以便您在缺少符号时立即获得反馈,因为在这种情况下,您的程序无法正常工作。

Using the --warn-unresolved-symbols flag you can change ld behavior in this case from an error to a warning: 使用--warn-unresolved-symbols标志,您可以在这种情况下将ld行为从错误更改为警告:

$ g++ -Wl,--warn-unresolved-symbols main.cpp -lusertest

Should emit a warning but still create the executable. 应该发出警告但仍然创建可执行文件。 Note that you still need to provide the library name, otherwise ld won't know where to look for the needed symbols. 请注意,您仍然需要提供库名称,否则ld将不知道在哪里查找所需的符号。

On Windows, the linker needs to know exactly which symbol belongs to which library in order to produce the necessary import tables. 在Windows上,链接器需要确切地知道哪个符号属于哪个库,以便生成必要的导入表。 So it is impossible to build a PE binary with unresolved symbols. 因此,无法使用未解析的符号构建PE二进制文件。

The code segment of an executable is always read-only as a security measure, so you can not have a program that modifies its own code at runtime. 可执行文件的代码段始终是只读的安全措施,因此您不能拥有在运行时修改自己的代码的程序。 As others have mentioned, what the linker is doing is generating a list of what symbols are provided per library. 正如其他人所提到的,链接器正在做的是生成每个库提供的符号列表。

You suggest this process could be deferred to run time, but that would mean that your binary could crash every time you launch it if the list of libraries you provided at link time was incomplete. 您建议将此过程延迟到运行时,但这意味着如果您在链接时提供的库列表不完整,则每次启动它时二进制文件都可能崩溃。 Why would you risk that when you can simply check that at link time? 当你可以在链接时简单地检查一下,为什么你会冒风险? Deferring symbol resolution to runtime would mean that each time you run your program it would perform the same search in all its dependencies for all unresolved symbols. 将符号解析延迟到运行时意味着每次运行程序时,它将对所有未解析的符号执行相同的搜索。 Furthermore, if you did not have to give the list of libraries at link time, it would mean that it would have to try all possible libraries at runtime. 此外,如果您不必在链接时提供库列表,则意味着它必须在运行时尝试所有可能的库。 How would you resolve a symbol that is defined by multiple libraries? 您如何解析由多个库定义的符号?

As I understand (in a very simplified way), what the dynamic linker does at runtime is keep a hash table that translates those symbols into addresses (function pointers) in the dynamically linked library after it is mapped in your program's address space. 据我所知(以非常简单的方式),动态链接器在运行时所做的是保留一个哈希表,在将其映射到程序的地址空间后,将这些符号转换为动态链接库中的地址(函数指针)。 In your executable, the linker needs to know which library provides each symbol (function, variable, etc) to perform this resolution. 在您的可执行文件中,链接器需要知道哪个库提供了每个符号(函数,变量等)来执行此解析。

So, in this very simplified explanation , your call to usertest::helloWorld(); 所以,在这个非常简化的解释中 ,你调用usertest::helloWorld(); is translated to something like dynamic_resolve("usertest::helloWorld", "libusertest.so")(); 被翻译成类似dynamic_resolve("usertest::helloWorld", "libusertest.so")(); with dynamic_resolve receiving the symbol name and the library name, and returning a function pointer. with dynamic_resolve接收符号名称和库名称,并返回一个函数指针。 Internally, what dynamic_resolve (made-up name) is doing is loading the library "libusertest.so", retrieving the address of the function in the library, caching this in a hash table, and then return the function pointer. 在内部, dynamic_resolve (虚构名称)正在做的是加载库“libusertest.so”,检索库中函数的地址,在哈希表中缓存它,然后返回函数指针。 It is probably using these system calls. 它可能正在使用这些系统调用。 After the first call, as the result is cached in a hash table and the library is already loaded, all subsequent calls are much cheaper. 在第一次调用之后,由于结果缓存在哈希表中并且库已经加载,所以后续调用都要便宜得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM