简体   繁体   English

与其他宽度不同,为什么短(16 位)变量将值移动到寄存器并将其存储?

[英]Why does the short (16-bit) variable mov a value to a register and store that, unlike other widths?

int main()
{
00211000  push        ebp  
00211001  mov         ebp,esp  
00211003  sub         esp,10h  
    char charVar1;
    short shortVar1;
    int intVar1;
    long longVar1;
    
    charVar1 = 11;
00211006  mov         byte ptr [charVar1],0Bh  

    shortVar1 = 11;
0021100A  mov         eax,0Bh  
0021100F  mov         word ptr [shortVar1],ax  

    intVar1 = 11;
00211013  mov         dword ptr [intVar1],0Bh 
 
    longVar1 = 11;
0021101A  mov         dword ptr [longVar1],0Bh  
}

Other data types do not go through registers, but only short types go through registers.其他数据类型没有 go 通过寄存器,而只有短类型 go 通过寄存器。 What's wrong?怎么了?

GCC does the same thing, using mov reg, imm32 / mov m16, reg instead of mov mem, imm16 . GCC 做同样的事情,使用mov reg, imm32 / mov m16, reg而不是mov mem, imm16

It's to avoid LCP stalls on Intel P6-family CPUs from 16-bit operand-size mov imm16 .这是为了避免英特尔 P6 系列 CPU 从 16 位操作数大小mov imm16

An LCP (length changing prefix) stall occurs when a prefix changes the length of the rest of the instruction compared to the same machine code bytes without prefixes.与没有前缀的相同机器代码字节相比,当前缀更改指令的 rest 的长度时,会发生 LCP(长度更改前缀)停顿。

mov word ptr [ebp - 8], 11 would involve a 66 prefix that makes the rest of the instruction 5 bytes (opcode + modrm + disp8 + imm16) instead of 7 (opcode + modrm + disp8 + imm32) for the same opcode / modrm.) mov word ptr [ebp - 8], 11将涉及66前缀,这使得指令的 rest 为 5 个字节(操作码 + modrm + disp8 + imm16)而不是 7(操作码 + modrm + disp8 + imm32)对于相同的操作码 /现代。)

 66 c7 45 f8 0b 00          mov     WORD PTR [ebp-0x8],0xb
    c7 45 f8 0b 00 00 00    mov    DWORD PTR [ebp-0x8],0xb
    ^
  opcode

This length change confuses the instruction-length finding stage (pre-decode) which happens before chunks of machine code are routed to the actual decoders.这种长度变化混淆了在机器代码块被路由到实际解码器之前发生的指令长度查找阶段(预解码)。 They're forced to back up and use a slower method that accounts for prefixes in the way they look at opcodes.他们被迫备份并使用一种较慢的方法来解释他们查看操作码的方式中的前缀。 (Parallel decode of x86 machine code is hard). (x86 机器码的并行解码很难)。 The penalty for this backup can be up to 11 cycles depending on microarchitecture and alignment of the instruction and should be avoided if possible.根据微体系结构和指令的 alignment,此备份的惩罚可能高达 11 个周期,应尽可能避免。

See Does a Length-Changing Prefix (LCP) incur a stall on a simple x86_64 instruction?请参阅长度更改前缀 (LCP) 是否会导致简单 x86_64 指令停止? for lots of details on what a Length Changing Prefix stall is, and the performance effect of stalling the pre-decode stage in Intel P6 and SnB-family CPUs for a few cycles , and that Sandybridge-family (modern mainstream Intel) special-cases mov opcodes to avoid LCP stalls from 16-bit immediates.有关长度更改前缀停止是什么的详细信息,以及在 Intel P6 和 SnB 系列 CPU 中停止预解码阶段几个周期的性能影响,以及 Sandybridge 系列(现代主流 Intel)特殊情况mov操作码以避免 16 位立即数的 LCP 停顿。


mov specifically doesn't have a problem on modern Intel mov特别在现代英特尔上没有问题

Sandybridge-family removed LCP stalls for mov specifically (still exists for other instructions), so this tuning decision only helps Nehalem and earlier. Sandybridge 系列专门为mov删除了 LCP 停顿(对于其他指令仍然存在),因此此调整决定仅对 Nehalem 和更早版本有所帮助。

AFAIK, it's not a thing on Silvermont-family, nor on any AMD, so this is probably something MSVC and GCC should update for their tune=generic since P6-family CPUs are less and less relevant these days. AFAIK,这不是 Silvermont 系列的事情,也不是任何 AMD 的事情,所以这可能是 MSVC 和 GCC 应该为他们的tune=generic更新的东西,因为这些天 P6 系列 CPU 的相关性越来越低。 (And if latest dev versions of GCC / MSVC changed now, it would be another year or so before lots of software distributions / releases would be built with a new compiler.) (如果 GCC / MSVC 的最新开发版本现在发生了变化,那么在使用新编译器构建大量软件发行版/版本之前,还需要一年左右的时间。)

clang doesn't do this optimization, and it's not a disaster even on old P6-family CPUs because most software doesn't use a lot of short / int16_t variables. clang没有做这个优化,即使在旧的 P6 系列 CPU 上也不是灾难,因为大多数软件不使用很多short / int16_t变量。 (And the bottleneck isn't always the front-end, often cache misses.) (瓶颈并不总是前端,通常是缓存未命中。)


Examples例子

Storing to the stack at all for this function is of course due to not enabling optimization.为这个 function 存储到堆栈当然是由于没有启用优化。 Since those variables aren't volatile , they should be optimized away completely since nothing reads them later.由于这些变量不是volatile ,因此应该完全优化它们,因为以后不会读取它们。 When you want to make examples of asm output, don't write a main , write a function that has to have some side-effect, eg storing through a pointer, or use volatile .当你想做 asm output 的例子时,不要写main ,写一个 function 必须有一些副作用,例如通过指针存储,或使用volatile

void foo(short *p){
    volatile short x = 123;
    *p = 123;
}

Compiles with MSVC 19.14 -O2 ( https://godbolt.org/z/eWhzhEsEa ):使用 MSVC 19.14 -O2 ( https://godbolt.org/z/eWhzhEsEa ) 编译:

x$ = 8
p$ = 8
foo     PROC                                          ; COMDAT
        mov     eax, 123                      ; 0000007bH
        mov     WORD PTR x$[rsp], ax
        mov     WORD PTR [rcx], ax
        ret     0
foo     ENDP

Or with GCC11.2 -O3 , which sucks even more, not CSEing /reusing the register constant或者使用 GCC11.2 -O3 ,这更糟糕,而不是CSEing /重用寄存器常量

foo:
        mov     eax, 123
        mov     edx, 123
        mov     WORD PTR [rsp-2], ax
        mov     WORD PTR [rdi], dx
        ret

But we can see that this is an Intel tuning since with -O3 -march=znver1 (AMD Zen 1):但我们可以看到这是一个 Intel 调整,因为使用-O3 -march=znver1 (AMD Zen 1):

foo:
        mov     WORD PTR [rsp-2], 123
        mov     WORD PTR [rdi], 123
        ret

Unfortunately it still does the LCP-avoidance for mov with -march=skylake , so it doesn't know the full rules.不幸的是,它仍然使用-march=skylakemov避免 LCP,因此它不知道完整的规则。

And if we use *p += 12345;如果我们使用*p += 12345; (a number big enough to not fit in an imm8 , which add allows unlike mov) instead of just = , ironically GCC then uses a length-changing-prefix with -march=skylake (as does MSVC), creating a stall: add WORD PTR [rdi], 12345 . (一个大到不适合imm8的数字,它允许与 mov 不同)而不是仅仅= ,具有讽刺意味的是 GCC 然后使用-march=skylake的长度更改前缀(与 MSVC 一样),创建一个停顿: add WORD PTR [rdi], 12345

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM