x86_64：是否可以“在线替换”PLT/GOT 引用？

Question

I'm not sure what a good subject line for this question is, but here we go:我不确定这个问题的好主题是什么，但我们开始吧：

In order to force code locality/compactness for a critical section of code, I'm looking for a way to call a function in an external (dynamically-loaded) library through a "jump slot" (an ELF R_X86_64_JUMP_SLOT relocation) directly at the call site - what the linker ordinarily puts into PLT / GOT, but have these inlined right at the call site.为了强制代码的关键部分的代码局部性/紧凑性，我正在寻找一种方法来通过“跳转槽”（ELF R_X86_64_JUMP_SLOT重定位）直接在外部（动态加载）库中调用函数调用站点 - 链接器通常放入 PLT / GOT 中的内容，但在调用站点正确内联这些内容。

If I emulate the call like:如果我模拟这样的呼叫：

#include <stdio.h>
int main(int argc, char **argv)
{
        asm ("push $1f\n\t"
             "jmp *0f\n\t"
             "0: .quad %P0\n"
             "1:\n\t"
             : : "i"(printf), "D"("Hello, World!\n"));
        return 0;
}

To get the space for a 64bit word, the call itself works (please, no comments about this being lucky coincidence as this breaks certain ABI rules - all these are not subject of this question.为了获得 64 位单词的空间，调用本身可以工作（请不要评论这是幸运的巧合，因为这违反了某些 ABI 规则——所有这些都不是这个问题的主题。

For my case, be worked around/addressed in other ways, I'm trying to keep this example brief).对于我的情况，以其他方式解决/解决，我试图保持这个例子简短）。

It creates the following assembly:它创建以下程序集：

0000000000000000 <main>:
0:   bf 00 00 00 00          mov    $0x0,%edi
1: R_X86_64_32  .rodata.str1.1
5:   68 00 00 00 00          pushq  $0x0
6: R_X86_64_32  .text+0x19
a:   ff 24 25 00 00 00 00    jmpq   *0x0
d: R_X86_64_32S .text+0x11
...
11: R_X86_64_64 printf
19:   31 c0                   xor    %eax,%eax
1b:   c3                      retq

But (due to using printf as the immediate, I guess ... ?) the target address here is still that of the PLT hook - the same R_X86_64_64 reloc.但是（由于使用printf作为立即数，我猜......？）这里的目标地址仍然是 PLT 钩子的地址 - 相同的R_X86_64_64 reloc。 Linking the object file against libc into an actual executable results in:将针对 libc 的目标文件链接到实际的可执行文件中会导致：

0000000000400428 <printf@plt>:
  400428:       ff 25 92 04 10 00       jmpq   *1049746(%rip)        # 5008c0 <_GLOBAL_OFFSET_TABLE_+0x20>
[ ... ]
0000000000400500 <main>:
  400500:       bf 0c 06 40 00          mov    $0x40060c,%edi
  400505:       68 19 05 40 00          pushq  $0x400519
  40050a:       ff 24 25 11 05 40 00    jmpq   *0x400511
  400511:       [ .quad 400428 ]
  400519:       31 c0                   xorl   %eax, %eax
  40051b:       c3                      retq
[ ... ]
DYNAMIC RELOCATION RECORDS
OFFSET           TYPE              VALUE
[ ... ]
00000000005008c0 R_X86_64_JUMP_SLOT  printf

Ie this still gives the two-step redirection, first transfer execution to the PLT hook, then jump into the library entry point.即这仍然给出了两步重定向，首先将执行转移到 PLT 钩子，然后跳转到库入口点。

Is there a way how I can instruct the compiler/assembler/linker to - in this example - "inline" the jump slot target at address 0x400511 ?有没有办法指示编译器/汇编器/链接器 - 在本例中 - “内联”地址0x400511处的跳转槽目标？

Ie replace the "local" (resolved at program link time by ld ) R_X86_64_64 reloc with the "remote" (resolved at program load time by ld.so ) R_X86_64_JUMP_SLOT one (and force non-lazy-load for this section of code) ?即替换“本地”（在程序链接时由ld解析） R_X86_64_64 reloc 与“远程”（在程序加载时由ld.so解析） R_X86_64_JUMP_SLOT一个（并强制非延迟加载这部分代码）？ Maybe linker mapfiles might make this possible - if so, how?也许链接器映射文件可能使这成为可能 - 如果是这样，如何？

Edit:编辑：
To make this clear, the question is about how to achieve this in a dynamically-linked executable / for an external function that's only available in a dynamic library.为了说明这一点，问题是如何在动态链接的可执行文件中实现这一点/对于仅在动态库中可用的外部函数。 Yes, it's true static linking resolves this in a simpler way, but:是的，真正的静态链接以更简单的方式解决了这个问题，但是：

There are systems (like Solaris) where static libraries are generally not shipped by the vendor有些系统（如 Solaris）中的静态库通常不由供应商提供
There are libraries that aren't available as either source code or static versions有些库既不能作为源代码也不能作为静态版本使用

Hence static linking is not helpful here :(因此静态链接在这里没有帮助:(

Edit2:编辑2：
I've found that in some architectures (SPARC, noticeably, see section on SPARC relocations in the GNU as manual ), GNU is able to create certain types of relocation references for the linker in-place using modifiers .我发现在某些体系结构（SPARC，值得注意的是，请参阅GNU 中 SPARC 重定位部分作为手册），GNU 能够使用修饰符就地为链接器创建某些类型的重定位引用。 The quoted SPARC one would use %gdop(symbolname) to make the assembler emit instructions to the linker stating "create that relocation right here".引用的 SPARC 将使用%gdop(symbolname)使汇编器向链接器发出说明“在此处创建重定位”的指令。 Intel's assembler on Itanium knows the @fptr(symbol) link-relocation operator for the same kind of thing (see also section 4 in the Itanium psABI ).英特尔在 Itanium 上的汇编程序知道@fptr(symbol) 链接重定位运算符用于同一类事物（另请参阅Itanium psABI中的第 4 节）。 But does an equivalent mechanism - something to instruct the assembler to emit a specific linker relocation type at a specific position in the code - exist for x86_64?但是对于 x86_64 是否存在等效机制——指示汇编器在代码中的特定位置发出特定链接器重定位类型的机制？

I've also found that the GNU assembler has a .reloc directive which supposedly is to be used for this purpose;我还发现 GNU 汇编器有一个.reloc指令，据说用于此目的； still, if I try:仍然，如果我尝试：

#include <stdio.h>
int main(int argc, char **argv)
{
        asm ("push %%rax\n\t"
             "lea 1f(%%rip), %%rax\n\t"
             "xchg %%rax, (%rsp)\n\t"
             "jmp *0f\n\t"
             ".reloc 0f, R_X86_64_JUMP_SLOT, printf\n\t"
             "0: .quad 0\n"
             "1:\n\t"
             : : "D"("Hello, World!\n"));
        return 0;
}

I get an error from the linker (note that 7 == R_X86_64_JUMP_SLOT ):我从链接器收到一个错误（注意7 == R_X86_64_JUMP_SLOT ）：

error: /tmp/cc6BUEZh.o: unexpected reloc 7 in object file

The assembler creates an object file for which readelf says: 汇编器创建一个目标文件， readelf说：

 Relocation section '.rela.text.startup' at offset 0x5e8 contains 2 entries:偏移量 0x5e8 处的重定位部分“.rela.text.startup”包含 2 个条目：\nOffset Info Type Symbol's Value Symbol's Name + Addend偏移信息类型符号的值符号的名称+加数\n0000000000000001 000000050000000a R_X86_64_32 0000000000000000 .rodata.str1.1 + 0 0000000000000001 000000050000000a R_X86_64_32 0000000000000000 .rodata.str1.1 + 0\n0000000000000017 0000000b00000007 R_X86_64_JUMP_SLOT 0000000000000000 printf + 0 0000000000000017 0000000b00000007 R_X86_64_JUMP_SLOT 0000000000000000 printf + 0\n

This is what I want - but the linker doesn't take it.这就是我想要的 - 但链接器不接受。
The linker does accept just using R_X86_64_64 instead above;链接器确实接受只使用R_X86_64_64而不是上面的； doing that creates the same kind of binary as in the first case ... redirecting to printf@plt , not the "resolved" one.这样做会创建与第一种情况相同类型的二进制文件......重定向到printf@plt ，而不是“已解决”的二进制文件。

Answer 1

In order to inline the call you would need a code ( .text ) relocation whose result is the final address of the function in the dynamically loaded shared library.为了内联调用，您需要一个代码（ .text ）重定位，其结果是动态加载的共享库中函数的最终地址。 No such relocation exists (and modern static linkers don't allow them) on x86_64 using a GNU toolchain for GNU/Linux, therefore you cannot inline the entire call as you wish to do.使用 GNU/Linux 的 GNU 工具链在 x86_64 上不存在此类重定位（并且现代静态链接器不允许它们），因此您无法按照您的意愿内联整个调用。

The closest you can get is a direct call through the GOT (avoids PLT):最接近的是通过 GOT 直接调用（避免 PLT）：

    .section    .rodata
.LC0:
    .string "Hello, World!\n"
    .text
    .globl  main
    .type   main, @function
main:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    $.LC0, %eax
    movq    %rax, %rdi
    call    *printf@GOTPCREL(%rip)
    nop
    popq    %rbp
    ret
    .size   main, .-main

This should generate a R_X86_64_GLOB_DAT relocation against printf in the GOT to be used by the sequence above.这应该在 GOT 中生成一个R_X86_64_GLOB_DAT重定位，以供上述序列使用。 You need to avoid C code because in general the compiler may use any number of caller-saved registers in the prologue and epilogue, and this forces you to save and restore all such registers around the asm function call or risk corrupting those registers for later use in the wrapper function.您需要避免使用 C 代码，因为通常编译器可能会在序言和结语中使用任意数量的调用者保存的寄存器，这会迫使您保存和恢复 asm 函数调用周围的所有此类寄存器，否则可能会损坏这些寄存器以供以后使用在包装函数中。 Therefore it is easier to write the wrapper in pure assembly.因此，在纯汇编中编写包装器更容易。

Another option is to compile with -Wl,-z,now -Wl,-z,relro which ensures the PLT and PLT-related GOT entries are resolved at startup to increase code locality and compactness.另一种选择是使用-Wl,-z,now -Wl,-z,relro进行编译-Wl,-z,now -Wl,-z,relro以确保在启动时解析 PLT 和 PLT 相关的 GOT 条目，以增加代码的局部性和紧凑性。 With full RELRO you'll only have to run code in the PLT and access data in the GOT, two things which should already be somewhere in the cache hierarchy of the logical core.使用完整的 RELRO，您只需在 PLT 中运行代码并访问 GOT 中的数据，这两件事应该已经存在于逻辑核心的缓存层次结构中。 If full RELRO is enough to meet your needs then you wouldn't need wrappers and you would have added security benefits.如果完整的 RELRO 足以满足您的需求，那么您就不需要包装器，而且您会增加安全性优势。

The best options are really static linking or LTO if they are available to you.如果您可以使用，最好的选择是真正的静态链接或 LTO。

Answer 2

This optimization has since been implemented in GCC.此优化已在 GCC 中实施。 It can be enabled with the -fno-plt option and the noplt function attribute :可以使用-fno-plt选项和noplt函数属性启用它：

Do not use the PLT for external function calls in position-independent code.不要将 PLT 用于位置无关代码中的外部函数调用。 Instead, load the callee address at call sites from the GOT and branch to it.相反，从 GOT 在调用站点加载被调用者地址并分支到它。 This leads to more efficient code by eliminating PLT stubs and exposing GOT loads to optimizations.这通过消除 PLT 存根并将 GOT 负载暴露给优化来导致更高效的代码。 On architectures such as 32-bit x86 where PLT stubs expect the GOT pointer in a specific register, this gives more register allocation freedom to the compiler.在诸如 32 位 x86 之类的体系结构上，PLT 存根需要特定寄存器中的 GOT 指针，这为编译器提供了更多的寄存器分配自由。 Lazy binding requires use of the PLT;延迟绑定需要使用 PLT； with -fno-plt all external symbols are resolved at load time.使用-fno-plt所有外部符号在加载时解析。

Alternatively, the function attribute noplt can be used to avoid calls through the PLT for specific external functions.或者，函数属性noplt可用于避免通过 PLT 调用特定的外部函数。

In position-dependent code, a few targets also convert calls to functions that are marked to not use the PLT to use the GOT instead.在位置相关代码中，一些目标还将调用转换为标记为不使用 PLT 的函数，而是使用 GOT。

Answer 3

You can statically link the executable.您可以静态链接可执行文件。 Just add -static to the final link command, and all you indirect jumps will be replaced by direct calls.只需在最后的链接命令中添加-static ，所有间接跳转都将被直接调用替换。

x86_64：是否可以“在线替换”PLT/GOT 引用？

问题描述

3 个解决方案

解决方案1
3 2016-05-25 17:54:54

解决方案2
2 已采纳 2020-04-18 16:08:04

解决方案3
-1 2012-06-01 11:54:47

x86_64：是否可以“在线替换”PLT/GOT 引用？

问题描述

3 个解决方案

解决方案1 3 2016-05-25 17:54:54

解决方案2 2 已采纳 2020-04-18 16:08:04

解决方案3 -1 2012-06-01 11:54:47

解决方案1
3 2016-05-25 17:54:54

解决方案2
2 已采纳 2020-04-18 16:08:04

解决方案3
-1 2012-06-01 11:54:47