为什么除了GOT之外还存在PLT，而不仅仅是使用GOT？

Question

I understand that in a typical ELF binary, functions get called through the Procedure Linkage Table (PLT). 我了解在典型的ELF二进制文件中，通过过程链接表（PLT）调用函数。 The PLT entry for a function usually contains a jump to a Global Offset Table (GOT) entry. 函数的PLT条目通常包含到全局偏移表（GOT）条目的跳转。 This entry will first reference some code to load the actual function address into the GOT, and contain the actual function address after the first call (lazy binding). 该条目将首先引用一些代码以将实际功能地址加载到GOT中，并在首次调用（延迟绑定）之后包含实际功能地址。

To be precise, before lazy binding the GOT entry points back into the PLT, to the instructions following the jump into the GOT. 确切地说，在延迟绑定GOT条目之前，请先将其指向PLT，再跳转至GOT之后的说明。 These instructions will usually jump to the head of the PLT, from where some binding routine gets called which will then update the GOT entry. 这些指令通常会跳到PLT的开头，从那里调用一些绑定例程，然后将更新GOT条目。

Now I'm wondering why there are two indirections (calling into the PLT and then jumping to an address from the GOT), instead of just sparing the PLT and calling the address from the GOT directly. 现在，我想知道为什么有两种间接方式（调用PLT，然后从GOT跳转到地址），而不是仅仅保留PLT并直接从GOT调用地址。 It looks like this could save a jump and the complete PLT. 看起来这可以节省跳转和完整的PLT。 You would of course still need some code calling the binding routine, but this can be outside the PLT. 当然，您仍然需要一些代码来调用绑定例程，但这可以在PLT之外。

Is there anything I am missing? 我有什么想念的吗？ What is/was the purpose of an extra PLT? 额外的PLT的目的是什么？

Update: As suggested in the comments, I created some (pseudo-) code ASCII art to further explain what I'm referring to: 更新：如评论中所建议，我创建了一些（伪）代码ASCII艺术，以进一步解释我所指的内容：

This is the situation, as far as I understand it, in the current PLT scheme before lazy binding: (Some indirections between the PLT and printf are represented by "...".) 据我所知，在当前的PLT方案中，这是延迟绑定之前的情况：（PLT和printf之间的某些间接表示为“ ...”。）

Program                PLT                                 printf
+---------------+      +------------------+                +-----+
| ...           |      | push [0x603008]  |<---+       +-->| ... |
| call j_printf |--+   | jmp [0x603010]   |----+--...--+   +-----+
| ...           |  |   | ...              |    |
+---------------+  +-->| jmp [printf@GOT] |-+  |
                       | push 0xf         |<+  |
                       | jmp 0x400da0     |----+
                       | ...              |
                       +------------------+

… and after lazy binding: ……以及懒惰的绑定之后：

Program                PLT                       printf
+---------------+      +------------------+      +-----+
| ...           |      | push [0x603008]  |  +-->| ... |
| call j_printf |--+   | jmp [0x603010]   |  |   +-----+
| ...           |  |   | ...              |  |
+---------------+  +-->| jmp [printf@GOT] |--+
                       | push 0xf         |
                       | jmp 0x400da0     |
                       | ...              |
                       +------------------+

In my imaginary alternative scheme without a PLT, the situation before lazy binding would look like this: (I kept the code in the "Lazy Binding Table" similar to to the one from the PLT. It could also look differently, I don't care.) 在我没有PLT的虚构替代方案中，延迟绑定之前的情况如下所示：（我将代码保存在“ Lazy Binding Table”中，类似于PLT中的代码。它看起来也有所不同，我没有关心。）

Program                    Lazy Binding Table                printf
+-------------------+      +------------------+              +-----+
| ...               |      | push [0x603008]  |<-+       +-->| ... |
| call [printf@GOT] |--+   | jmp [0x603010]   |--+--...--+   +-----+
| ...               |  |   | ...              |  |
+-------------------+  +-->| push 0xf         |  |
                           | jmp 0x400da0     |--+
                           | ...              |
                           +------------------+

Now after the lazy binding, one wouldn't use the table anymore: 现在，在惰性绑定之后，将不再使用该表：

Program                   Lazy Binding Table        printf
+-------------------+     +------------------+      +-----+
| ...               |     | push [0x603008]  |  +-->| ... |
| call [printf@GOT] |--+  | jmp [0x603010]   |  |   +-----+
| ...               |  |  | ...              |  |
+-------------------+  |  | push 0xf         |  |
                       |  | jmp 0x400da0     |  |
                       |  | ...              |  |
                       |  +------------------+  |
                       +------------------------+

Answer 1

The problem is that replacing call printf@PLT with call [printf@GOTPLT] requires that the compiler knows that the function printf exists in a shared library and not a static library (or even in just a plain object file). 问题在于，用call printf@PLT call [printf@GOTPLT]代替call printf@PLT要求编译器知道函数printf存在于共享库中，而不是静态库中（甚至仅存在于普通对象文件中）。 The linker can change call printf into call printf@PLT , jmp printf into jmp printf@PLT or even mov eax, printf into mov eax, printf@PLT because all it's doing it changing a relocation based on the symbol printf into relocation based on the symbol printf@PLT . 该连接器可以改变call printf到call printf@PLT ， jmp printf到jmp printf@PLT甚至mov eax, printf成mov eax, printf@PLT ，因为所有它做它改变基于该符号的重定位printf到搬迁基础上，符号printf@PLT 。 The linker can't change call printf into call [printf@GOTPLT] because it doesn't know from the relocation whether it's a CALL or JMP instruction or something else entirely. 链接器无法将call printf更改为call [printf@GOTPLT]因为它无法call [printf@GOTPLT]定位中得知它是CALL还是JMP指令或完全是其他东西。 Without knowing whether it's a CALL instruction or not, it doesn't know whether it should change the opcode from a direct CALL to a indirect CALL. 不知道它是否是CALL指令，就不知道是否应该将操作码从直接CALL更改为间接CALL。

However even if there was a special relocation type that indicated that the instruction was a CALL, you still have the problem that a direct call instruction is a 5 bytes long but a indirect call instruction is 6 bytes long. 但是，即使存在指示该指令为CALL的特殊重定位类型，您仍然有一个问题，即直接调用指令的长度为5个字节，而间接调用指令的长度为6个字节。 The compiler would have to emit code like nop; call printf@CALL 编译器将不得不发出类似nop; call printf@CALL代码nop; call printf@CALL nop; call printf@CALL to give the linker room to insert the additional byte needed and it would have to do it for all calls to any global function. nop; call printf@CALL给链接器空间插入所需的附加字节，并且对于任何对全局函数的所有调用都必须这样做。 It would probably end up being a net performance loss because of all the extra and not actually necessary NOP instructions. 由于所有额外且实际上不是必需的NOP指令，最终可能会导致净性能下降。

Another problem is that on 32-bit x86 targets the PLT entries are relocated at runtime. 另一个问题是，在32位x86目标上，PLT条目在运行时被重定位。 The indirect jmp [xxx@GOTPLT] instructions in the PLT don't use relative addressing like the direct CALL and JMP instructions, and since the address of xxx@GOTPLT depends on where the image was loaded in memory the instruction needs to be fixed up to use the correct address. PLT中的间接jmp [xxx@GOTPLT]指令不使用直接CALL和JMP指令那样的相对寻址，并且由于xxx@GOTPLT的地址取决于映像在内存中的加载位置，因此该指令需要固定。使用正确的地址。 By having all these indirect JMP instructions grouped together in one .plt section means that much smaller number of virtual memory pages need to be modified. 通过将所有这些间接JMP指令分组在一个.plt节中，意味着需要修改的虚拟内存页面数量要少得多。 Each 4K page that's modified can no longer be shared with other processes, when the instructions that need to modified are scattered all over memory it requires that a much larger part the image to be unshared. 修改后的每个4K页面无法再与其他进程共享，当需要修改的指令散布在整个内存中时，它要求不共享图像的很大一部分。

Note that this later issue is only a problem with shared libraries and position independent executables on 32-bit x86 targets. 请注意，以后的问题仅是共享库和32位x86目标上与位置无关的可执行文件的问题。 Traditional executables can't be relocated, so there's no need to fix the @GOTPLT references, while on 64-bit x86 targets RIP relative addressing is used to access the @GOTPLT entries. 传统的可执行文件无法重定位，因此无需修复@GOTPLT引用，而在64位x86目标上，RIP相对地址用于访问@GOTPLT条目。

Because of that last point new versions of a GCC (6.1 or later) support the -fno-plt flag. 因此，GCC的新版本（6.1或更高版本）支持-fno-plt标志。 On 64-bit x86 targets this option causes the compiler to generate call printf@GOTPCREL[rip] instructions instead of call printf instructions. 在64位x86目标上，此选项使编译器生成call printf@GOTPCREL[rip]指令，而不是call printf指令。 However it appears to do this for any call to a function that isn't defined in the same compilation unit. 但是，似乎可以对未在同一编译单元中定义的任何函数调用进行此操作。 That is any function it doesn't know for sure isn't defined in shared library. 那是肯定不知道在共享库中没有定义的任何函数。 That would mean that indirect jumps would also be used for calls to functions defined in other object files or static libraries. 这意味着间接跳转也将用于对其他目标文件或静态库中定义的函数的调用。 On 32-bit x86 targets the -fno-plt option is ignored unless compiling position independent code ( -fpic or -fpie ) where it results in call printf@GOT[ebx] instructions being emitted. 在32位x86目标上，除非编译与位置无关的代码（ -fpic或-fpie ），否则它会导致call printf@GOT[ebx]指令，否则-fno-plt选项将被忽略。 In addition to generating unnecessary indirect jumps, this also has the disadvantage of requiring the allocation of a register for the GOT pointer though most functions would need it allocated anyways. 除了产生不必要的间接跳转之外，这还具有需要为GOT指针分配寄存器的缺点，尽管大多数功能无论如何都需要分配它。

Finally, Windows is able to do what you suggest by declaring symbols in header files with the "dllimport" attribute, indicating that they exist in DLLs. 最后，Windows可以通过在带有“ dllimport”属性的头文件中声明符号来表明您的建议，这些符号表明它们存在于DLL中。 This way the compiler knows whether or not to generate direct or indirect call instruction when calling the function. 这样，编译器知道在调用函数时是否生成直接或间接调用指令。 The disadvantage of this is that the symbol has to exist in a DLL, so if this attribute used is you can't decide after compilation to link with a static library instead. 这样做的缺点是符号必须存在于DLL中，因此，如果使用了此属性，则无法在编译后决定与静态库链接。

Read also Drepper's How to write a shared library paper, it explains that quite well in details (for Linux). 另请参阅Drepper的“ 如何编写共享库文件”，它在细节上对此进行了很好的解释（对于Linux）。

Answer 2

Now I'm wondering why there are two indirections (calling into the PLT and then jumping to an address from the GOT), 现在，我想知道为什么有两种间接方式（调用PLT，然后从GOT跳转到一个地址），

First of all there are two calls , but just one indirection (call to PLT stub is direct ). 首先，有两个调用，但是只有一个间接调用（对PLT存根的调用是直接的 ）。

instead of just sparing the PLT and calling the address from the GOT directly. 而不只是保留PLT并直接从GOT调用地址。

In case you do not need lazy binding, you can use -fno-plt which bypasses the PLT. 如果不需要惰性绑定，可以使用-fno-plt绕过PLT。

But if you wanted to keep it, you'd need some stub code to see if symbol has been resolved and branch accordingly. 但是，如果要保留它，则需要一些存根代码以查看符号是否已解析并相应地分支。 Now, to facilitate branch prediction, this stub code has to be duplicated for every called symbol and voila , you re-invented the PLT. 现在，为了促进分支预测，必须为每个被调用的符号和voila复制此存根代码，您重新发明了PLT。

为什么除了GOT之外还存在PLT，而不仅仅是使用GOT？

问题描述

2 个解决方案

解决方案1
13 已采纳 2017-03-28 19:27:38

解决方案2
3 2017-03-27 14:46:47

为什么除了GOT之外还存在PLT，而不仅仅是使用GOT？

问题描述

2 个解决方案

解决方案1 13 已采纳 2017-03-28 19:27:38

解决方案2 3 2017-03-27 14:46:47

解决方案1
13 已采纳 2017-03-28 19:27:38

解决方案2
3 2017-03-27 14:46:47