简体   繁体   English

为什么我的 hello world 二进制文件大多为零?

[英]Why is my hello world binary mostly zeroes?

I've compiled我已经编译

#include <stdio.h>

int main() {
    printf("Hello world");
    return 0;
}

on a Mac and it's 48k in size.在 Mac 上,它的大小为 48k。 However when I look at the binary with xxd most of it looks like this:然而,当我用xxd查看二进制文件时,大部分看起来是这样的:

...
0000b990: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000b9a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000b9b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
...

Why is it so?为什么会这样?

otool tells me: otool告诉我:

 otool -L hello
hello:
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.0.0)

so fine it's linked dynamically again libSystem, where it printf is.太好了,它再次动态链接 libSystem,它是printf所在的位置。

Then why all the zeroes?那为什么都是零呢?

Because alignment.因为 alignment。

XNU enforced that every segment that maps part of the binary be aligned to the platform's page size. XNU 强制将映射部分二进制文件的每个段与平台的页面大小对齐。 On x86_64, that is 0x1000 bytes, on arm64 that is 0x4000 bytes (even where the hardware would support 0x1000).在 x86_64 上是 0x1000 字节,在 arm64 上是 0x4000 字节(即使硬件支持 0x1000)。 And if the data for certain segments must be aligned to a certain offset, then there has to be something in the file that fills the gap in between - usually zeroes.如果某些段的数据必须对齐到某个偏移量,那么文件中必须有一些东西来填补两者之间的空白——通常是零。

Now, if your binary is 48KB, then its segments will probably look something like this:现在,如果你的二进制文件是 48KB,那么它的段可能看起来像这样:

LC 00: LC_SEGMENT_64  Mem: 0x000000000-0x100000000  File: Not Mapped    ---/--- __PAGEZERO
LC 01: LC_SEGMENT_64  Mem: 0x100000000-0x100004000  File: 0x0-0x4000    r-x/r-x __TEXT
LC 02: LC_SEGMENT_64  Mem: 0x100004000-0x100008000  File: 0x4000-0x8000 rw-/rw- __DATA_CONST
LC 03: LC_SEGMENT_64  Mem: 0x100008000-0x10000c000  File: 0x8000-0xc000 rw-/rw- __DATA
LC 04: LC_SEGMENT_64  Mem: 0x10000c000-0x100010000  File: 0xc000-0xc110 r--/r-- __LINKEDIT

For an alignment of 0x4000, that is already the minimal layout.对于 0x4000 的 alignment,这已经是最小布局了。 But if you're on Intel, you can force the linker to use 0x1000 by passing -Wl,-segalign,0x1000 to the compiler.但是,如果您使用的是 Intel,则可以通过将-Wl,-segalign,0x1000传递给编译器来强制 linker 使用 0x1000。 This should result in a binary that is only about 12KB:这应该会产生一个只有大约 12KB 的二进制文件:

LC 00: LC_SEGMENT_64  Mem: 0x000000000-0x100000000  File: Not Mapped    ---/--- __PAGEZERO
LC 01: LC_SEGMENT_64  Mem: 0x100000000-0x100001000  File: 0x0-0x1000    r-x/r-x __TEXT
LC 02: LC_SEGMENT_64  Mem: 0x100001000-0x100002000  File: 0x1000-0x2000 rw-/rw- __DATA_CONST
LC 03: LC_SEGMENT_64  Mem: 0x100002000-0x100003000  File: 0x2000-0x3000 rw-/rw- __DATA
LC 04: LC_SEGMENT_64  Mem: 0x100003000-0x100004000  File: 0x3000-0x3110 r--/r-- __LINKEDIT

If you wanted to further optimise your binary, you'd need to get rid of segments.如果你想进一步优化你的二进制文件,你需要去掉段。 With imports and linking, the only one you can really get rid of is __DATA_CONST , and you can do that by targeting macOS Mojave (or older) with -mmacosx-version-min=10.14 .通过导入和链接,您唯一可以真正摆脱的是__DATA_CONST ,您可以通过使用-mmacosx-version-min=10.14定位 macOS Mojave(或更早版本)来做到这一点。 This will leave you with just over 8KB:这将使您只剩下 8KB 多一点:

LC 00: LC_SEGMENT_64  Mem: 0x000000000-0x100000000  File: Not Mapped    ---/--- __PAGEZERO
LC 01: LC_SEGMENT_64  Mem: 0x100000000-0x100001000  File: 0x0-0x1000    r-x/r-x __TEXT
LC 02: LC_SEGMENT_64  Mem: 0x100001000-0x100002000  File: 0x1000-0x2000 rw-/rw- __DATA
LC 03: LC_SEGMENT_64  Mem: 0x100002000-0x100003000  File: 0x2000-0x20f0 r--/r-- __LINKEDIT

If you were striving for the smallest possible executable, you could further ditch __DATA and possibly even __LINKEDIT , but you'd have to substantially change your code to only emit raw syscalls, not use the dynamic linker, etc.如果您正在争取尽可能小的可执行文件,您可以进一步放弃__DATA甚至可能__LINKEDIT ,但您必须大幅更改您的代码以仅发出原始系统调用,而不是使用动态 linker 等。

For any real-world application, I would also say that these zeroes effectively don't matter.对于任何现实世界的应用程序,我还要说这些零实际上并不重要。 Given four mapped segments, they will never use up more than 48KB.给定四个映射段,它们使用的空间永远不会超过 48KB。 And the bigger the binary, the smaller the percentage that the zeroes make up.二进制越大,零组成的百分比越小。

As for distribution, there's the obvious answer: xz .至于分布,有一个明显的答案: xz
Compressing the above binaries with that yields:用它压缩上述二进制文件会产生:

  • 776 bytes for the 48KB binary. 48KB 二进制文件为 776 字节。
  • 736 bytes for the 12KB binary. 12KB 二进制文件为 736 字节。
  • 684 bytes for the 8KB binary. 8KB 二进制文件为 684 字节。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM