C-外部组装函数使用相同的输入返回不同的结果

Question

I have a program in C which uses a NASM function. 我在C中有一个使用NASM函数的程序。 Here is the code of the C program: 这是C程序的代码：

#include <stdio.h>
#include <string.h>
#include <math.h>

extern float hyp(float a); // supposed to calculate 1/(2 - a) + 6

void test(float (*f)(float)){
    printf("%f %f %f\n", f(2.1), f(2.1), f(2.1));
}

void main(int argc, char** argv){
    for(int i = 1; i < argc; i++){
        if(!strcmp(argv[i], "calculate")){
            test(hyp);
        }
    }
}

And here is the NASM function: 这是NASM函数：

section .data
    a dd 1.0
    b dd 2.0
    c dd 6.0

section .text
global hyp
hyp:
    push ebp
    mov ebp, esp
    finit

    fld dword[b]
    fsub dword[ebp + 8]
    fstp dword[b]
    fld dword[a]
    fdiv dword[b]
    fadd dword[c]

    mov esp, ebp
    pop ebp
    ret

These programs were linked in Linux with gcc and nasm. 这些程序在Linux中与gcc和nasm链接在一起。 Here is the Makefile: 这是Makefile：

all: project clean
main.o: main.c
    gcc -c main.c -o main.o -m32 -std=c99
hyp.o: hyp.asm
    nasm -f elf32 -o hyp.o hyp.asm -D UNIX
project: main.o hyp.o
    gcc -o project main.o hyp.o -m32 -lm
clean:
    rm -rf *.o

When the program is run, it outputs this: 程序运行时，将输出以下内容：

5.767442 5.545455 -4.000010

The last number is correct. 最后一个数字是正确的。 My question is: why do these results differ even though the input is the same? 我的问题是：即使输入相同，为什么这些结果也不同？

Answer 1

The bug is that you do this: 错误是您执行以下操作：

fstp dword[b]

That overwrites the value of b , so the next time you call the function, the constant is wrong. 这将覆盖b的值，因此，下次调用该函数时，常数是错误的。 In the overall program's output, this shows up as the rightmost evaluation being the only correct one, because the compiler evaluated the arguments to printf from right to left. 在整个程序的输出中，这显示为最右边的评估是唯一正确的评估，因为编译器从右到左评估了printf的参数。 (It is allowed to evaluate the arguments to a multi-argument function in any order it wants .) （可以按任意顺序评估多参数函数的参数。）

You should have used the .rodata section for your constants; 您应该使用.rodata节作为常量。 then the program would crash rather than overwrite a constant. 那么程序将崩溃而不是覆盖常量。

You can avoid needing to store and reload an intermediate value by using fdivr instead of fdiv . 您可以避免使用fdivr而不是fdiv来存储和重新加载中间值。

hyp:
    fld     DWORD PTR [b]
    fsub    DWORD PTR [esp+4]
    fdivr   DWORD PTR [a]
    fadd    DWORD PTR [c]
    ret

Alternatively, do what a Forth programmer would do, and load the constant 1 before everything else, so it's in ST(1) when it needs to be. 或者，执行Forth程序员会做的事情，并在其他所有内容之前加载常量1，因此在需要时将其放在ST（1）中。 This allows you to use fld1 instead of putting 1.0 in memory. 这使您可以使用fld1而不是将1.0放入内存。

hyp:
    fld1
    fld     DWORD PTR [b]
    fsub    DWORD PTR [esp+4]
    fdivp
    fadd    DWORD PTR [c]
    ret

You do not need to issue a finit , because the ABI guarantees that this was already done during process startup. 您不需要发出finit ，因为ABI保证在启动过程中已经完成了此操作。 You do not need to set up EBP for this function, as it does not make any function calls itself (the jargon term for this is "leaf procedure"), nor does it need any scratch space on the stack. 您不需要为此函数设置EBP，因为它不会调用任何函数（该术语的术语是“叶子过程”），也不需要堆栈上的任何临时空间。

Another alternative, if you have a modern CPU, is to use the newer SSE2 instructions. 如果您使用的是现代CPU，另一种选择是使用更新的SSE2指令。 That gives you normal registers instead of an operand stack, and also means the calculations are all actually done in float instead of 80-bit extended, which can be very important — some complex numerical algorithms will malfunction if they have more floating-point precision than the designers expected to have. 这为您提供了普通的寄存器，而不是操作数堆栈，并且还意味着计算实际上都是以float而不是80位扩展数完成的，这可能非常重要-一些复杂的数值算法如果浮点精度比设计师期望的。 Because you're using the 32-bit ELF ABI, though, the return value still needs to wind up in ST(0), and there's no direct move instructions between SSE and x87 registers, you have to go through memory. 但是，由于使用32位ELF ABI，返回值仍需要在ST（0）中结束，并且SSE和x87寄存器之间没有直接移动指令，因此必须遍历内存。 I don't know how to write SSE2 instructions in Intel syntax, sorry. 抱歉，我不知道如何用Intel语法编写SSE2指令。

hyp:
    subl    $4, %esp
    movss   b, %xmm1
    subss   8(%esp), %xmm1
    movss   a, %xmm0
    divss   %xmm1, %xmm0
    addss   c, %xmm0
    movss   %xmm0, (%esp)
    flds    (%esp)
    addl    $4, %esp
    ret

In the 64-bit ELF ABI, with floating-point return values in XMM0 (and argument passing in registers by default as well), that would just be 在64位ELF ABI中，在XMM0中有浮点返回值（默认情况下，参数也会传入寄存器），这将是

hyp:
    movss   b(%rip), %xmm1
    subss   %xmm0, %xmm1
    movss   a(%rip), %xmm0
    divss   %xmm1, %xmm0
    addss   c(%rip), %xmm0
    ret

C-外部组装函数使用相同的输入返回不同的结果

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-05-04 17:37:54

C-外部组装函数使用相同的输入返回不同的结果

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-05-04 17:37:54

解决方案1
2 已采纳 2018-05-04 17:37:54