简体   繁体   English

x86-64 Linux 中不再允许 32 位绝对地址?

[英]32-bit absolute addresses no longer allowed in x86-64 Linux?

64 bit Linux uses the small memory model by default, which puts all code and static data below the 2GB address limit. 64 位 Linux 默认使用小内存模型,将所有代码和静态数据放在 2GB 地址限制以下。 This makes sure that you can use 32-bit absolute addresses.这确保您可以使用 32 位绝对地址。 Older versions of gcc use 32-bit absolute addresses for static arrays in order to save an extra instruction for relative address calculation.旧版本的 gcc 对静态数组使用 32 位绝对地址,以便为相对地址计算节省额外的指令。 However, this no longer works.但是,这不再有效。 If I try to make a 32-bit absolute address in assembly, I get the linker error: "relocation R_X86_64_32S against `.data' can not be used when making a shared object; recompile with -fPIC".如果我尝试在汇编中创建 32 位绝对地址,则会收到链接器错误:“创建共享对象时无法使用针对 `.data' 的重定位 R_X86_64_32S;使用 -fPIC 重新编译”。 This error message is misleading, of course, because I am not making a shared object and -fPIC doesn't help.当然,此错误消息具有误导性,因为我没有创建共享对象并且 -fPIC 没有帮助。 What I have found out so far is this: gcc version 4.8.5 uses 32-bit absolute addresses for static arrays, gcc version 6.3.0 doesn't.到目前为止我发现的是:gcc 4.8.5 版对静态数组使用 32 位绝对地址,gcc 6.3.0 版没有。 version 5 probably doesn't either.版本 5 可能也没有。 The linker in binutils 2.24 allows 32-bit absolute addresses, verson 2.28 does not. binutils 2.24 中的链接器允许 32 位绝对地址,版本 2.28 不允许。

The consequence of this change is that old libraries have to be recompiled and legacy assembly code is broken.此更改的结果是必须重新编译旧库并且破坏旧的汇编代码。

Now I want to ask: When was this change made?现在我想问:这个变化是什么时候做出的? Is it documented somewhere?它在某处记录了吗? And is there a linker option that makes it accept 32-bit absolute addresses?是否有链接器选项使其接受 32 位绝对地址?

Your distro configured gcc with --enable-default-pie , so it's making position-independent executables by default, (allowing for ASLR of the executable as well as libraries).您的发行版使用--enable-default-pie配置了 gcc,因此--enable-default-pie它会制作位置无关的可执行文件(允许可执行文件和库的 ASLR)。 Most distros are doing that, these days.如今,大多数发行版都在这样做。

You actually are making a shared object: PIE executables are sort of a hack using a shared object with an entry-point.您实际上是在创建一个共享对象:PIE 可执行文件是一种使用带有入口点的共享对象的黑客。 The dynamic linker already supported this, and ASLR is nice for security, so this was the easiest way to implement ASLR for executables.动态链接器已经支持这一点,而且 ASLR 对安全性很好,所以这是为可执行文件实现 ASLR 的最简单方法。

32-bit absolute relocation aren't allowed in an ELF shared object; ELF 共享对象中不允许 32 位绝对重定位; that would stop them from being loaded outside the low 2GiB (for sign-extended 32-bit addresses).这将阻止它们被加载到低 2GiB 之外(对于符号扩展的 32 位地址)。 64-bit absolute addresses are allowed, but generally you only want that for jump tables or other static data, not as part of instructions. 64 位绝对地址是允许的,但通常您只希望将其用于跳转表或其他静态数据,而不是作为指令的一部分。 1 1

The recompile with -fPIC part of the error message is bogus for hand-written asm;错误信息的recompile with -fPIC部分recompile with -fPIC对于手写 asm 来说是假的; it's written for the case of people compiling with gcc -c and then trying to link with gcc -shared -o foo.so *.o , with a gcc where -fPIE is not the default.它是为人们使用gcc -c编译然后尝试与gcc -shared -o foo.so *.o链接的情况而编写的,其中-fPIE不是默认值的 gcc。 The error message should probably change because many people are running into this error when linking hand-written asm.错误消息可能应该更改,因为许多人在链接手写 asm 时遇到此错误。


How to use RIP-relative addressing: basics如何使用 RIP 相对寻址:基础知识

Always use RIP-relative addressing for simple cases where there's no downside.对于没有缺点的简单情况,始终使用 RIP 相对寻址。 See also footnote 1 below and this answer for syntax .另请参阅下面的脚注 1 和语法的答案 Only consider using 32-bit absolute addressing when it's actually helpful for code-size instead of harmful.仅当 32 位绝对寻址实际上对代码大小有帮助而不是有害时才考虑使用。 eg NASM default rel at the top of your file.例如, NASM default rel位于文件顶部。

AT&T foo(%rip) or in GAS .intel_syntax noprefix use [rip + foo] . AT&T foo(%rip)或在 GAS .intel_syntax noprefix使用[rip + foo]


Disable PIE mode to make 32-bit absolute addressing work禁用 PIE 模式使 32 位绝对寻址工作

Use gcc -fno-pie -no-pie to override this back to the old behaviour.使用gcc -fno-pie -no-pie将其覆盖回旧行为。 -no-pie is the linker option, -fno-pie is the code-gen option . -no-pie是链接器选项, -fno-pie是代码生成选项 With only -fno-pie , gcc will make code like mov eax, offset .LC0 that doesn't link with the still-enabled -pie .仅使用-fno-pie ,gcc 将生成类似mov eax, offset .LC0代码,这些代码不与仍然启用的-pie链接。

( clang can have PIE enabled by default, too: use clang -fno-pie -nopie . A July 2017 patch made -no-pie an alias for -nopie , for compat with gcc, but clang4.0.1 doesn't have it.) clang也可以默认启用 PIE:使用clang -fno-pie -nopie 7 月的补丁使-no-pie成为-nopie的别名,用于与 gcc 兼容,但 clang4.0.1 没有它。 )


Performance cost of PIE for 64-bit (minor) or 32-bit code (major) 64 位(次要)或 32 位代码(主要)的 PI​​E 性能成本

With only -no-pie , (but still -fpie ) compiler-generated code (from C or C++ sources) will be slightly slower and larger than necessary , but will still be linked into a position-dependent executable which won't benefit from ASLR.仅使用-no-pie ,(但仍然是-fpie )编译器生成的代码(来自 C 或 C++ 源代码)将比必要的稍慢且更大,但仍将链接到位置相关的可执行文件中,该可执行文件不会从中受益反光镜。 "Too much PIE is bad for performance" reports an average slowdown of 3% for x86-64 on SPEC CPU2006 (I don't have a copy of the paper so IDK what hardware that was on :/). “太多的 PIE 对性能不利” 报告说,在 SPEC CPU2006 上 x86-64 的平均速度降低了 3% (我没有论文的副本,所以 IDK 上的硬件是什么:/)。 But in 32-bit code, the average slowdown is 10%, worst-case 25% (on SPEC CPU2006).但在 32 位代码中,平均减速为 10%,最坏情况为 25%(在 SPEC CPU2006 上)。

The penalty for PIE executables is mostly for stuff like indexing static arrays, as Agner describes in the question, where using a static address as a 32-bit immediate or as part of a [disp32 + index*4] addressing mode saves instructions and registers vs. a RIP-relative LEA to get an address into a register. PIE 可执行文件的惩罚主要是因为索引静态数组之类的东西,正如 Agner 在问题中所描述的那样,其中使用静态地址作为 32 位立即数或作为[disp32 + index*4]寻址模式的一部分可以节省指令和寄存器与 RIP 相关的 LEA 相比,将地址放入寄存器。 Also 5-byte mov r32, imm32 instead of 7-byte lea r64, [rel symbol] for getting a static address into a register is nice for passing the address of a string literal or other static data to a function.同样 5 字节mov r32, imm32而不是 7 字节lea r64, [rel symbol]用于将静态地址放入寄存器中,对于将字符串文字或其他静态数据的地址传递给函数是很好的。

-fPIE still assumes no symbol-interposition for global variables / functions, unlike -fPIC for shared libraries which have to go through the GOT to access globals (which is yet another reason to use static for any variables that can be limited to file scope instead of global). -fPIE仍然假定全局变量/函数没有符号插入,这与共享库的-fPIC不同,共享库必须通过 GOT 访问全局变量(这是对任何可以限制在文件范围内的变量使用static另一个原因)全球)。 See The sorry state of dynamic libraries on Linux .请参阅Linux 上动态库的遗憾状态

Thus -fPIE is much less bad than -fPIC for 64-bit code, but still bad for 32-bit because RIP-relative addressing isn't available .因此,对于 64 位代码, -fPIE-fPIC差得多,但对于 32位代码仍然很差,因为 RIP 相对寻址不可用 See some examples on the Godbolt compiler explorer .请参阅Godbolt 编译器资源管理器上的一些示例 On average, -fPIE has a very small performance / code-size downside in 64-bit code.平均而言, -fPIE在 64 位代码中具有非常小的性能/代码大小缺点。 The worst case for a specific loop might only be a few %.特定循环的最坏情况可能只有几个百分点。 But 32-bit PIE can be much worse.但是 32 位 PIE 可能会更糟。

None of these -f code-gen options make any difference when just linking, or when assembling .S hand-written asm.这些-f code-gen 选项在链接时或在组装.S手写 asm 时都没有任何区别。 gcc -fno-pie -no-pie -O3 main.c nasm_output.o is a case where you want both options. gcc -fno-pie -no-pie -O3 main.c nasm_output.o是您需要两个选项的情况。


Checking your GCC config检查您的 GCC 配置

If your GCC was configured this way, gcc -v |& grep -o -e '[^ ]*pie' prints --enable-default-pie .如果你的 GCC 是这样配置的, gcc -v |& grep -o -e '[^ ]*pie'打印--enable-default-pie Support for this config option was added to gcc in early 2015 . 2015 年初,gcc 中添加了对此配置选项的支持。 Ubuntu enabled it in 16.10, and Debian around the same time in gcc 6.2.0-7 (leading to kernel build errors: https://lkml.org/lkml/2016/10/21/904 ). Ubuntu 在 16.10 中启用它,而 Debian 在 gcc 6.2.0-7同时6.2.0-7 (导致内核构建错误: https : 6.2.0-7 )。

Related: Build compressed x86 kernels as PIE was also affected by the changed default.相关:构建压缩的 x86 内核,因为 PIE也受到更改的默认值的影响。

Why doesn't Linux randomize the address of the executable code segment? 为什么Linux不随机化可执行代码段的地址? is an older question about why it wasn't the default earlier, or was only enabled for a few packages on older Ubuntu before it was enabled across the board.是一个较早的问题,关于为什么它不是早期的默认设置,或者在全面启用之前仅在较旧的 Ubuntu 上启用了几个软件包。


Note that ld itself didn't change its default .请注意, ld本身并没有改变它的默认. It still works normally (at least on Arch Linux with binutils 2.28).它仍然可以正常工作(至少在带有 binutils 2.28 的 Arch Linux 上)。 The change is that gcc defaults to passing -pie as a linker option, unless you explicitly use -static or -no-pie .变化是gcc默认将-pie作为链接器选项传递,除非您明确使用-static-no-pie

In a NASM source file, I used a32 mov eax, [abs buf] to get an absolute address.在 NASM 源文件中,我使用a32 mov eax, [abs buf]来获取绝对地址。 (I was testing if the 6-byte way to encode small absolute addresses (address-size + mov eax,moffs: 67 a1 40 f1 60 00 ) has an LCP stall on Intel CPUs. It does .) (我正在测试编码小绝对地址的 6 字节方式(地址大小 + mov eax,moffs: 67 a1 40 f1 60 00 )在英特尔 CPU 上是否有 LCP 停顿。确实如此。)

nasm -felf64 -Worphan-labels -g -Fdwarf testloop.asm &&
ld -o testloop testloop.o              # works: static executable

gcc -v -nostdlib testloop.o            # doesn't work
...
..../collect2  ... -pie ...
/usr/bin/ld: testloop.o: relocation R_X86_64_32 against `.bss' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status

gcc -v -no-pie -nostdlib testloop.o    # works
gcc -v -static -nostdlib testloop.o    # also works: -static implies -no-pie

GCC can also make a "static PIE" with -static-pie ; GCC 还可以使用-static-pie制作“静态 PIE”; ASLRed by no dynamic libraries or ELF interpreter. ASLR 没有动态库或 ELF 解释器。 Not the same thing as -static -pie - those conflict with each other (you get a static non-PIE) although it might possibly get changed .-static -pie - 尽管它可能会被更改,它们彼此冲突(您会得到一个静态的非 PIE)。

related: building static / dynamic executables with/without libc, defining _start or main .相关: 使用/不使用 libc 构建静态/动态可执行文件,定义_startmain


Checking if an existing executable is PIE or not检查现有的可执行文件是否为 PIE

This has also been asked at: How to test whether a Linux binary was compiled as position independent code?这也被问到: 如何测试 Linux 二进制文件是否被编译为位置无关代码?

file and readelf say that PIEs are "shared objects", not ELF executables. filereadelf说 PIE 是“共享对象”,而不是 ELF 可执行文件。 ELF-type EXEC can't be PIE. ELF 类型的 EXEC 不能是 PIE。

$ gcc -fno-pie  -no-pie -O3 hello.c
$ file a.out
a.out: ELF 64-bit LSB executable, ...

$ gcc -O3 hello.c
$ file a.out
a.out: ELF 64-bit LSB shared object, ...

 ## Or with a more recent version of file:
a.out: ELF 64-bit LSB pie executable, ...

gcc -static-pie is a special thing that GCC doesn't do by default, even with -nostdlib . gcc -static-pie是 GCC 默认不做的特殊事情,即使使用-nostdlib也是-nostdlib It shows up as LSB pie executable , dynamically linked with current versions of file .它显示为LSB pie executable ,与当前版本的file dynamically linked (See What's the difference between "statically linked" and "not a dynamic executable" from Linux ldd? ). (请参阅Linux ldd 中的“静态链接”和“非动态可执行文件”有什么区别? )。 It has ELF-type DYN, but readelf shows no .interp , and ldd will tell you it's statically linked.它有 ELF 类型的 DYN,但readelf没有显示.interp ,并且ldd会告诉你它是静态链接的。 GDB starti and /proc/maps confirms that execution starts at the top of its _start , not in an ELF interpreter. GDB starti/proc/maps确认执行从其_start的顶部开始,而不是在 ELF 解释器中。



Semi-related (but not really): another recent gcc feature is gcc -fno-plt .半相关(但不是真的):另一个最近的 gcc 功能是gcc -fno-plt Finally calls into shared libraries can be just call [rip + symbol@GOTPCREL] (AT&T call *puts@GOTPCREL(%rip) ), with no PLT trampoline.最后调用共享库可以只call [rip + symbol@GOTPCREL] (AT&T call *puts@GOTPCREL(%rip) ),没有 PLT 蹦床。

The NASM version of this is call [rel puts wrt ..got]这个 NASM 版本是call [rel puts wrt ..got]
as an alternative to call puts wrt ..plt .作为call puts wrt ..plt的替代方法。 See Can't call C standard library function on 64-bit Linux from assembly (yasm) code .请参阅无法在 64 位 Linux 上从汇编 (yasm) 代码调用 C 标准库函数 This works in a PIE or non-PIE, and avoids having the linker build a PLT stub for you.这适用于 PIE 或非 PIE,并避免让链接器为您构建 PLT 存根。

Some distros have started enabling it.一些发行版已经开始启用它。 It also avoids needing writeable + executable memory pages so it's good for security against code-injection.它还避免了需要可写 + 可执行的内存页面,因此有利于防止代码注入的安全性。 (I think modern PLT implementation's don't need that either, just updating a GOT pointer not rewriting a jmp rel32 instruction, so there might not be a security difference.) (我认为现代 PLT 实现也不需要那个,只需更新 GOT 指针而不重写jmp rel32指令,因此可能没有安全差异。)

It's a significant speedup for programs that make a lot of shared-library calls, eg x86-64 clang -O2 -g compiling tramp3d goes from 41.6s to 36.8s on whatever hardware the patch author tested on .对于进行大量共享库调用的程序来说,这是一个显着的加速,例如 x86-64 clang -O2 -g在补丁作者测试的任何硬件上编译 tramp3d 从 41.6s 到 36.8s。 (clang is maybe a worst-case scenario for shared library calls, making lots of calls to small LLVM library functions.) (clang 可能是共享库调用的最坏情况,对小型 LLVM 库函数进行了大量调用。)

It does require early binding instead of lazy dynamic linking, so it's slower for big programs that exit right away.它确实需要早期绑定而不是懒惰的动态链接,因此对于立即退出的大程序来说速度较慢。 (eg clang --version or compiling hello.c ). (例如clang --version或编译hello.c )。 This slowdown could be reduced with prelink, apparently.显然,预链接可以减少这种放缓。

This doesn't remove the GOT overhead for external variables in shared library PIC code, though.但是,这不会消除共享库 PIC 代码中外部变量的 GOT 开销。 (See the godbolt link above). (请参阅上面的 Godbolt 链接)。


Footnotes 1脚注 1

64-bit absolute addresses actually are allowed in Linux ELF shared objects, with text relocations to allow loading at different addresses (ASLR and shared libraries). Linux ELF 共享对象实际上允许使用 64 位绝对地址,并通过文本重定位允许在不同地址(ASLR 和共享库)加载。 This allows you to have jump tables in section .rodata , or static const int *foo = &bar;这允许您在section .rodatastatic const int *foo = &bar;有跳转表static const int *foo = &bar; without a runtime initializer.没有运行时初始化程序。

So mov rdi, qword msg works (NASM/YASM syntax for 10-byte mov r64, imm64 , aka AT&T syntax movabs , the only instruction which can use a 64-bit immediate).所以mov rdi, qword msg有效(10 字节mov r64, imm64 NASM/ mov r64, imm64语法,又名 AT&T 语法movabs ,唯一可以使用 64 位立即数的指令)。 But that's larger and usually slower than lea rdi, [rel msg] , which is what you should use if you decide not to disable -pie .但这比lea rdi, [rel msg]更大且通常更慢lea rdi, [rel msg]如果您决定-pie-pie ,则应该使用它。 A 64-bit immediate is slower to fetch from the uop cache on Sandybridge-family CPUs, according to Agner Fog's microarch pdf .根据Agner Fog 的 microarch pdf 文件,从 Sandybridge 系列 CPU 上的 uop 缓存中获取 64 位立即数的速度较慢。 (Yes, the same person who asked this question. :) (是的,就是问这个问题的那个人。:)

You can use NASM's default rel instead of specifying it in every [rel symbol] addressing mode.您可以使用 NASM 的default rel而不是在每个[rel symbol]寻址模式中指定它。 See also Mach-O 64-bit format does not support 32-bit absolute addresses.另请参阅Mach-O 64 位格式不支持 32 位绝对地址。 NASM Accessing Array for some more description of avoiding 32-bit absolute addressing. NASM 访问数组有关避免 32 位绝对寻址的更多描述。 OS X can't use 32-bit addresses at all, so RIP-relative addressing is the best way there, too. OS X 根本不能使用 32 位地址,因此 RIP 相对寻址也是最好的方法。

In position-dependent code ( -no-pie ), you should use mov edi, msg when you want an address in a register;在位置相关代码( -no-pie )中,当您需要寄存器中的地址时,您应该使用mov edi, msg 5-byte mov r32, imm32 is even smaller than RIP-relative LEA, and more execution ports can run it. 5字节的mov r32, imm32比RIP-relative LEA还要小,可以运行更多的执行端口。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM