简体   繁体   English

在编译X86时,如何防止函数与16字节边界对齐?

[英]How can I prevent functions from being aligned to 16 bytes boundary when compiling for X86?

I'm working in an embedded-like environment where each byte is extremely precious, much more so than additional cycles for unaligned accesses. 我在类似嵌入式的环境中工作,每个字节都非常珍贵,远远超过了未对齐访问的额外周期。 I have some simple Rust code from an OS development example: 我从OS开发示例中得到了一些简单的Rust代码:

#![feature(lang_items)]
#![no_std]
extern crate rlibc;
#[no_mangle]
pub extern fn rust_main() {

    // ATTENTION: we have a very small stack and no guard page

    let hello = b"Hello World!";
    let color_byte = 0x1f; // white foreground, blue background

    let mut hello_colored = [color_byte; 24];
    for (i, char_byte) in hello.into_iter().enumerate() {
        hello_colored[i*2] = *char_byte;
    }

    // write `Hello World!` to the center of the VGA text buffer
    let buffer_ptr = (0xb8000 + 1988) as *mut _;
    unsafe { *buffer_ptr = hello_colored };

    loop{}

}

#[lang = "eh_personality"] extern fn eh_personality() {}
#[lang = "panic_fmt"] #[no_mangle] pub extern fn panic_fmt() -> ! {loop{}}

I also use this linker script: 我也使用这个链接器脚本:

OUTPUT_FORMAT("binary")
ENTRY(rust_main)
phys = 0x0000;
SECTIONS
{
  .text phys : AT(phys) {
    code = .;
    *(.text.start);
    *(.text*)
    *(.rodata)
    . = ALIGN(4);
  }
  __text_end=.;
  .data : AT(phys + (data - code))
  {
    data = .;
    *(.data)
    . = ALIGN(4);
  }
  __data_end=.;
  .bss : AT(phys + (bss - code))
  {
    bss = .;
    *(.bss)
    . = ALIGN(4);
  }
  __binary_end = .;
}

I optimize it with opt-level: 3 and LTO using an i586 targeted compiler and the GNU ld linker, including -O3 in the linker command. 我使用opt-level: 3优化它opt-level: 3和使用i586目标编译器和GNU ld链接器的LTO,包括链接器命令中的-O3 I've also tried opt-level: z and a coupled -Os at the linker, but this resulted in code that was bigger (it didn't unroll the loop). 我也尝试过opt-level: z和链接器上的耦合-Os ,但是这导致代码更大(它没有展开循环)。 As it stands, the size seems pretty reasonable with opt-level: 3 . 按照目前的情况, opt-level: 3的大小似乎相当合理opt-level: 3

There are quite a few bytes that seem wasted on aligning functions to some boundary. 在将函数对齐到某些边界时,似乎浪费了相当多的字节。 After the unrolled loop, 7 nop instructions are inserted and then there is an infinite loop as expected. 在展开的循环之后,插入7个nop指令,然后按预期存在无限循环。 After this, there appears to be another infinite loop that is preceded by 7 16-bit override nop instructions (ie, xchg ax,ax rather than xchg eax,eax ). 在此之后,似乎存在另一个无限循环,其前面是7个16位覆盖nop指令(即xchg ax,ax而不是xchg eax,eax )。 This adds up to about 26 bytes wasted in a 196 byte flat binary. 这在196字节的平面二进制文件中浪费了大约26个字节。

  • What exactly is the optimizer doing here? 优化器究竟在做什么?
  • What options do I have to disable it? 我有什么选项可以禁用它?
  • Why is unreachable code being included in the binary? 为什么无法访问的代码包含在二进制文件中?

The full assembly listing below: 完整的装配清单如下:

   0:   c6 05 c4 87 0b 00 48    movb   $0x48,0xb87c4
   7:   c6 05 c5 87 0b 00 1f    movb   $0x1f,0xb87c5
   e:   c6 05 c6 87 0b 00 65    movb   $0x65,0xb87c6
  15:   c6 05 c7 87 0b 00 1f    movb   $0x1f,0xb87c7
  1c:   c6 05 c8 87 0b 00 6c    movb   $0x6c,0xb87c8
  23:   c6 05 c9 87 0b 00 1f    movb   $0x1f,0xb87c9
  2a:   c6 05 ca 87 0b 00 6c    movb   $0x6c,0xb87ca
  31:   c6 05 cb 87 0b 00 1f    movb   $0x1f,0xb87cb
  38:   c6 05 cc 87 0b 00 6f    movb   $0x6f,0xb87cc
  3f:   c6 05 cd 87 0b 00 1f    movb   $0x1f,0xb87cd
  46:   c6 05 ce 87 0b 00 20    movb   $0x20,0xb87ce
  4d:   c6 05 cf 87 0b 00 1f    movb   $0x1f,0xb87cf
  54:   c6 05 d0 87 0b 00 57    movb   $0x57,0xb87d0
  5b:   c6 05 d1 87 0b 00 1f    movb   $0x1f,0xb87d1
  62:   c6 05 d2 87 0b 00 6f    movb   $0x6f,0xb87d2
  69:   c6 05 d3 87 0b 00 1f    movb   $0x1f,0xb87d3
  70:   c6 05 d4 87 0b 00 72    movb   $0x72,0xb87d4
  77:   c6 05 d5 87 0b 00 1f    movb   $0x1f,0xb87d5
  7e:   c6 05 d6 87 0b 00 6c    movb   $0x6c,0xb87d6
  85:   c6 05 d7 87 0b 00 1f    movb   $0x1f,0xb87d7
  8c:   c6 05 d8 87 0b 00 64    movb   $0x64,0xb87d8
  93:   c6 05 d9 87 0b 00 1f    movb   $0x1f,0xb87d9
  9a:   c6 05 da 87 0b 00 21    movb   $0x21,0xb87da
  a1:   c6 05 db 87 0b 00 1f    movb   $0x1f,0xb87db
  a8:   90                      nop
  a9:   90                      nop
  aa:   90                      nop
  ab:   90                      nop
  ac:   90                      nop
  ad:   90                      nop
  ae:   90                      nop
  af:   90                      nop
  b0:   eb fe                   jmp    0xb0
  b2:   66 90                   xchg   %ax,%ax
  b4:   66 90                   xchg   %ax,%ax
  b6:   66 90                   xchg   %ax,%ax
  b8:   66 90                   xchg   %ax,%ax
  ba:   66 90                   xchg   %ax,%ax
  bc:   66 90                   xchg   %ax,%ax
  be:   66 90                   xchg   %ax,%ax
  c0:   eb fe                   jmp    0xc0
  c2:   66 90                   xchg   %ax,%ax

As Ross states , aligning functions and branch points to 16 bytes is a common x86 optimization recommended by Intel, although it can occasionally be less efficient, such as in your case. 正如Ross所说 ,将函数和分支点对齐到16个字节是英特尔推荐的常见x86优化,尽管它有时可能效率较低,例如在您的情况下。 For a compiler to optimally decide whether or not to align is a hard problem, and I believe LLVM simply opts to always align. 对于编译器来最佳地决定是否对齐是一个难题,我相信LLVM只是选择始终对齐。 See more info on Performance optimisations of x86-64 assembly - Alignment and branch prediction . 查看有关x86-64程序集的性能优化的更多信息 - 对齐和分支预测

As red75prime's comment hints (but doesn't explain), LLVM uses the value of the align-all-blocks as the byte alignment for branch points, so setting it to 1 will disable alignment. 由于red75prime的注释提示 (但没有解释),LLVM使用align-all-blocks作为分支点的字节对齐,因此将其设置为1将禁用对齐。 Note that this applies globally, and that comparison benchmarks are recommended. 请注意,这适用于全局,建议使用比较基准。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 x86 程序集中将 8 位地址移动到 16 位寄存器中? - How can I move an 8-bit address into a 16-bit register in x86 assembly? 如何在 16 位 MASM 程序集 x86 中创建睡眠功能? - How can I create a sleep function in 16bit MASM Assembly x86? 我可以在x86 / x86_64上自动增加16位计数器吗? - Can I atomically increment a 16 bit counter on x86/x86_64? 如何判断是否在 x86-64 程序集中输入了一个 16 字节对齐地址的循环? - How can one figure out if a loop is being entered with a 16 byte aligned address in x86-64 assembly? 如何在 x86 架构上从 C 调用汇编函数? - How to call Assembly Functions from C on x86 architecture? 如何在用于 Linux 的 nasm x86 程序集中复制数组,移植 16 位 DOS 代码? - How can i copy an array in nasm x86 assembly for Linux, porting 16-bit DOS code? 如何在x86程序集(16位DOS)中获取用户的十六进制输入以十进制打印? - How can I get user's hex input to print as decimal in x86 assembly (16-bit DOS)? 将局部变量与 16 字节边界对齐 (x86 asm) - Align local variable to 16-byte boundary (x86 asm) 如何在 x86 程序集中连接两个字符串? - How can I concatenate two strings in x86 Assembly? 如何以编程方式编辑二进制文件(x86)? - How to can I programmatically edit a binary (x86)?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM