简体   繁体   English

装配-交换功能-为什么不起作用?

[英]Assembly - Swap function - Why it will not work?

I need to create a function that swaps the value of &x with the value of &y (meaning swap *(&y) and *(&x). 我需要创建一个将&x的值与&y的值交换的函数(意味着交换*(&y)和*(&x)。

Swap:

    push EBP
    mov EBP,ESP
    mov EBX, [EBP+12] ; ebx = *x
    mov EAX, DWORD [EBX] ;eax = ebx = *x
    mov DWORD [EBP-4], EAX ; [ebp-4] = eax =*x
    mov EDX, [EBP+8] ; edx = *y
    mov EAX, DWORD [EDX] ; eax = *edx = *y
    mov DWORD [EBX], EAX ; ebx = eax = *y
    mov EAX, DWORD [EBP-4] ; eax = *x
    mov DWORD [EDX], EAX ; edx = *x
    pop EBP ; ebx = *y and edx = *x
    ret

I call it like this: 我这样称呼它:

    // call Swap
    push x
    push y
    call swap

I don't understand why it's not working. 我不明白为什么它不起作用。 I added comments that explain my understanding of it. 我添加了一些评论,以解释我对此的理解。 What's wrong with my implementation? 我的实现有什么问题? How can I fix it? 我该如何解决?

You don't actually reserve memory on the stack that you use when you access a dword at [EBP-4]. 在[EBP-4]中访问双字时,实际上并没有在使用的堆栈上保留内存。 It can get overwritten by things like interrupt routines, signal handlers, asynchronously called procedures, whatever applies in your OS. 它可能会被诸如中断例程,信号处理程序,异步调用的过程之类的东西覆盖,无论您的OS是否适用。

The code should be like this instead: 代码应改为:

swap:
    push  EBP
    mov   EBP,ESP           ; make a traditional stack frame

    sub   ESP, 4         ; reserve memory for a local variable at [EBP-4]

    mov   EBX, [EBP+12]        ; ebx = &x
    mov   EAX, DWORD [EBX]     ; eax = x
    mov   DWORD [EBP-4], EAX   ; [ebp-4] = eax = x
    mov   EDX, [EBP+8]         ; edx = &y
    mov   EAX, DWORD [EDX]     ; eax = y
    mov   DWORD [EBX], EAX     ; *&x = y
    mov   EAX, DWORD [EBP-4]   ; eax = x reloaded from the local
    mov   DWORD [EDX], EAX     ; *&y = x

    leave          ; remove locals (by restoring ESP), restore EBP

    ret

Also, make sure that you're passing as parameters the addresses of the variables x and y , not the values of the variables. 另外,请确保将变量xy的地址(而不是变量的值)作为参数传递。 push x + push y will pass the addresses of x and y in NASM but they will pass values of x and y in TASM and MASM. push x + push y将在NASM中传递xy的地址,但在TASM和MASM中它们将传递xy值。

Aside from Alexey's bugfix, you could make this significantly more efficient. 除了Alexey的错误修正外,您还可以使其效率大大提高。 (Of course inlining the swap and optimizing at the call site is even better.) (当然,在呼叫站点内联交换和优化效果更好。)

There's no need for a local temporary on the stack: you could instead reload one of the addresses twice, or save/restore ESI and use it as a temporary. 无需在堆栈上放置本地临时文件:您可以重新加载其中一个地址两次,或保存/还原ESI并将其用作临时文件。

You're actually destroying EBX, which is call-preserved in all the normal C calling conventions. 您实际上是在破坏EBX,而EBX在所有常规C调用约定中都保留了调用。 In most 32-bit x86 calling conventions, EAX, ECX, and EDX are the three call-clobbered registers you can use without saving/restoring, while the others are call-preserved. 在大多数32位x86调用约定中,EAX,ECX和EDX是无需保存/恢复即可使用的三个调用占据寄存器,而其他寄存器则保留调用。 (So ie your caller expects you not to destroy their values, so you can only use them if you put back the original value. This is why EBP has to be restored after you use it for a frame pointer.) (因此,调用者希望您不要破坏它们的值,因此只有在放回原始值后才能使用它们。这就是为什么在将EBP用作帧指针之后必须还原EBP的原因。)


What gcc -O3 -m32 does when compiling a stand-alone (not inlined) definition for a swap function is save/restore EBX so it has 4 registers to play with. 编译交换功能的独立(非嵌入式)定义时, gcc -O3 -m32作用是保存/恢复EBX,因此它具有4个寄存器供您使用。 clang chooses ESI. clang选择ESI。

void swap(int *px, int *py) {
    int tmp = *px;
    *px = *py;
    *py = tmp;
}

On the Godbolt compiler explorer : 在Godbolt编译器浏览器上

# gcc8.2 -O3 -m32 -fverbose-asm
# gcc itself emitted the comments on the following instructions
swap:
        push    ebx     #
        mov     edx, DWORD PTR [esp+8]    # px, px
        mov     eax, DWORD PTR [esp+12]   # py, py
        mov     ecx, DWORD PTR [edx]      # tmp, *px_3(D)
        mov     ebx, DWORD PTR [eax]      # tmp91, *py_5(D)
        mov     DWORD PTR [edx], ebx      # *px_3(D), tmp91
        mov     DWORD PTR [eax], ecx      # *py_5(D), tmp
        pop     ebx       #
        ret  

# DWORD PTR is the gas .intel_syntax equivalent of NASM's DWORD
# you can just remove them all because the register implies an operand size

It also avoids making a legacy stack-frame. 这也避免了制作遗留的堆栈框架。 You can add -fno-omit-frame-pointer to the compiler options to see code-gen with a frame pointer, if you want. 如果需要,可以将-fno-omit-frame-pointer到编译器选项中,以查看带有帧指针的代码源。 (Godbolt will recompile and show you the asm. Very handy site for exploring compiler options and code changes.) (Godbolt将重新编译并向您显示asm。非常方便的网站,供您探索编译器选项和代码更改。)

64-bit calling conventions already have args in registers, and have enough scratch regs so we just get 4 instructions, much more efficient. 64位调用约定已在寄存器中包含args,并且具有足够的暂存寄存器,因此我们仅获得4条指令,效率更高。


As I mentioned, another option is to reload one of the pointer args twice: 正如我提到的,另一种选择是两次重新加载一个指针args:

swap:
       # without a push, offsets relative to ESP are smaller by 4
        mov     edx, [esp+4]    # edx = px   reused later
        mov     eax, [esp+8]    # eax = py   also reused later
        mov     ecx, [edx]      # ecx = tmp = *px   lives for the whole function

        mov     eax, [eax]      # eax = *py   destroying our register copy of py
        mov    [edx], eax       # *px = *py;  done with px, can now destroy it

        mov     edx, [esp+8]   # edx = py
        mov    [edx], ecx       # *py = tmp;
        ret  

Only 7 instructions instead of 8. Loading the same value twice is very cheap, and out-of-order execution means it's not a problem to have the store address ready quickly even though in program order it's only the instruction right before the store that loads the address. 只有7条指令,而不是8条指令。两次加载相同的值非常便宜,并且乱序执行意味着快速准备好存储地址并不是问题,即使按照程序顺序,这只是加载之前的一条指令地址。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM