从使用x87 FPU的64位汇编函数返回一个浮点数

Question

我正在尝试制作一个程序来计算使用64位寄存器，浮点数和协处理器指令的方程式（当前哪个方程式无关紧要）。 不幸的是，我不知道如何以浮点形式获得等式的最终结果。 我可以：

fist qword ptr [bla]
mov rax,bla

并将函数类型更改为INT并获取我的值，但我无法将其作为FLOAT访问。 即使我将结果保留在ST（0）（协处理器堆栈的顶部），它也无法按预期工作，并且我的C ++程序得到了错误的结果。 我的汇编代码是：

public funct
.data
bla qword ?
bla2 qword 10.0
.code
funct PROC
push rbp
mov rbp, rsp
push rbx

mov bla,rcx
fild qword ptr[bla]

fld qword ptr [bla2]
fmul st(0), st(1)
fist dword ptr [bla]
pop rbx
pop rbp
ret
funct ENDP
END

我的C ++代码是：

#include <stdlib.h>
#include <cstdlib>
#include <stdio.h>

extern "C" float funct(long long n);
int main(){

    float value1= funct(3);

    return 0;
}

有什么问题，我该如何解决？

Answer 1

你的问题有点模棱两可，你的代码也是如此。 我将使用x87 FPU和SSE指令提出一些想法。 在64位代码中不鼓励使用x87 FPU指令，首选SSE / SSE2 。 SSE / SSE2适用于所有64位AMD和64位Intel x86处理器。

使用x87 FPU的64位代码中的32位浮点数

如果您的问题是“如何使用x87 FPU编写使用32位浮点数的64位汇编程序代码？” 然后你的C ++代码看起来很好，但你的汇编代码需要一些工作。 您的C ++代码建议函数的输出类型是32位浮点数：

extern "C" float funct(long long n);

我们需要创建一个返回32位浮点数的函数。 您的汇编程序代码可以按以下方式修改。 我在你的代码中保留了堆栈帧代码和RBX的推/弹，因为我假设你只给了我们一个最小的例子，你真正的代码是使用RBX 。 考虑到这一点，以下代码应该工作：

public funct
.data
ten REAL4 10.0                     ; Define variable ten as 32-bit (4-byte float)
                                   ; REAL4 and DWORD are both same size. 
                                   ; REAL4 makes for more readable code when using floats
.code
funct PROC
    push rbp
    mov rbp, rsp                   ; Setup stack frame
                                   ; RSP aligned to 16 bytes at this point
    push rbx

    mov [rbp+16],rcx               ; 32 byte shadow space is just above the return address
                                   ; at RBP+16 (this address is 16 byte aligned). Rather 
                                   ; than use a temporary variable in the data section to 
                                   ; store the value of RCX, we just store it to the 
                                   ; shadow space on the stack.
    fild QWORD ptr[rbp+16]         ; Load and convert 64-bit integer into st(0)
    fld [ten]                      ; st(0) => st(1), st(0) = 10.0
    fmulp                          ; st(1)=st(1)*st(0), st(1) => st(0)
    fstp REAL4 ptr [rbp+16]        ; Store result to shadow space as 32-bit float
    movss xmm0, REAL4 ptr [rbp+16] ; Store single scalar (32-bit float) to xmm0
                                   ; XMM0 = return value for 32(and 64-bit) floats
                                   ;        in 64-bit code.

    pop rbx
    mov rsp, rbp                   ; Remove stack frame
    pop rbp
    ret
funct ENDP
END

我已经对代码进行了评论，但可能感兴趣的是我不在DATA部分使用第二个变量。 64位Windows调用约定要求函数调用者确保堆栈在16字节边界上对齐，并且在进行调用之前分配了32字节阴影空间（AKA 寄存器参数区域 ）。 该区域可用作划痕区域。 由于我们设置了堆栈帧，因此RBP为RBP+0 ，返回地址为RBP+8 ，暂存区域为RBP+16 。 如果你没有使用堆栈帧，则返回地址为RSP+0 ，阴影空间将从RSP+8开始。我们可以在那里存储浮点运算的结果，而不是在标记为bla的QWORD中 。

在我们退出函数之前，解开浮点堆栈是一个合理的想法，因此没有任何东西留在它上面。 我使用FPU浮点函数在完成使用后弹出寄存器。

64位Microsoft调用约定要求在XMM0中返回浮点值。 我们使用SSE指令MOVSS将标量单（32位浮点）移动到XMM0寄存器。 这就是C ++代码期望返回值的地方。

使用SSE的64位代码中的32位浮点数

基于上一节中的想法，我们可以修改代码以使用带有32位浮点数的SSE指令。 这种代码的一个例子如下：

public funct
.data
ten REAL4 10.0                     ; Define variable ten as 32-bit (4-byte float)
                                   ; REAL4 and DWORD are both same size. 
                                   ; REAL4 makes for more readable code when using floats
.code
funct PROC
    push rbp
    mov rbp, rsp                   ; Setup stack frame
                                   ; RSP aligned to 16 bytes at this point
    push rbx
    cvtsi2ss xmm0, rcx             ; Convert scalar integer in RCX to 
                                   ;    scalar single(float) and store in XMM0
    mulss xmm0, [ten]              ; 32-bit float multiply by 10.0 store in XMM0
                                   ; XMM0 = return value for 32(and 64-bit) floats
                                   ;        in 64-bit code.
    pop rbx
    mov rsp, rbp                   ; Remove stack frame
    pop rbp
    ret
funct ENDP
END

此代码使用SSE指令删除x87 FPU的用法。 我们特别使用：

    cvtsi2ss xmm0, rcx             ; Convert scalar integer in RCX to 
                                   ;    scalar single(float) and store in XMM0

CVTSI2SS将标量整数转换为标量单（浮点）。 在这种情况下， RCX中的64位整数值将转换为32位浮点并存储在XMM0中 。 XMM0是我们将返回值放入的寄存器。 XMM0到XMM5被认为是volatile，因此我们不需要保存它们的值。

    mulss xmm0, [ten]              ; 32-bit float multiply by 10.0 store in XMM0
                                   ; XMM0 = return value for 32(and 64-bit) floats
                                   ;        in 64-bit code.

MULSS是一个SSE指令，用于使用标量单（浮点）进行SSE乘法。 在这种情况下， MULSS将执行XMM0 = XMM0 *（32位浮点内存操作数）。 这样可以使XMM0的32位浮点乘以32位浮点数10.0。 由于XMM0还包含我们的最终结果，因此除了正确退出函数之外，我们还有其他任何事情要做。

使用x87 FPU的64位代码中的64位双浮点数

这是第一个例子的变体，但现在我们使用的是64位浮点数，也称为C ++中的double类型，汇编程序中的REAL8 （或QWORD ），以及SSE2中的scalar double 。 由于我们现在使用double作为返回类型，我们必须将C ++代码修改为：

#include <stdlib.h>
#include <cstdlib>
#include <stdio.h>

extern "C" double funct(long long n);

int main() {    
    double value1 = funct(3);

    return 0;
}

汇编代码如下所示：

public funct
.data
ten REAL8 10.0                     ; Define variable ten as 64-bit (8-byte float)
                                   ; REAL8 and QWORD are both same size. 
                                   ; REAL8 makes for more readable code when using floats
.code
funct PROC
    push rbp
    mov rbp, rsp                   ; Setup stack frame
                                   ; RSP aligned to 16 bytes at this point
    push rbx

    mov [rbp+16],rcx               ; 32 byte shadow space is just above the return address
                                   ; at RBP+8 (this address is 16 byte aligned). Rather 
                                   ; than use a temporary variable in the data section to 
                                   ; store the value of RCX, we just store it to the 
                                   ; shadow space on the stack.
    fild QWORD ptr[rbp+16]         ; Load and convert 64-bit integer into st(0)
    fld [ten]                      ; st(0) => st(1), st(0) = 10.0
    fmulp                          ; st(1)=st(1)*st(0), st(1) => st(0)
    fstp REAL8 ptr [rbp+16]        ; Store result to shadow space as 64-bit float
    movsd xmm0, REAL8 ptr [rbp+16] ; Store double scalar (64-bit float) to xmm0
                                   ; XMM0 = return value for 32(and 64-bit) floats
                                   ;        in 64-bit code.

    pop rbx
    mov rsp, rbp                   ; Remove stack frame
    pop rbp
    ret
funct ENDP
END

此代码与使用32位浮点的x87代码几乎相同。 我们使用REAL8 （与QWORD相同）来存储64位浮点数，并使用MOVSD将64位双浮点（标量双精度）移动到XMM0 。 MOVSD是SSE2指令。 在XMM0中返回正确大小的浮点很重要。 如果使用MOVSS ，返回到C ++函数的值可能不正确。

使用SSE2的64位代码中的64位双浮点数

这是第二个例子的变体，但现在我们使用64位浮点数，也称为C ++中的double类型，汇编程序中的REAL8 （或QWORD ），以及SSE2中的scalar double 。 C ++代码应使用上一节中的代码，以便使用double而不是float 。 汇编程序代码与此类似：

public funct
.data
ten REAL8 10.0                     ; Define variable ten as 64-bit (8-byte float)
                                   ; REAL8 and QWORD are both same size. 
                                   ; REAL8 makes for more readable code when using floats
.code
funct PROC
    push rbp
    mov rbp, rsp                   ; Setup stack frame
                                   ; RSP aligned to 16 bytes at this point
    push rbx
    cvtsi2sd xmm0, rcx             ; Convert scalar integer in RCX to 
                                   ;    scalar double(double float) and store in XMM0
    mulsd xmm0, [ten]              ; 64-bit float multiply by 10.0 store in XMM0
                                   ; XMM0 = return value for 32(and 64-bit) floats
                                   ;        in 64-bit code.
    pop rbx
    mov rsp, rbp                   ; Remove stack frame
    pop rbp
    ret
funct ENDP
END

与第二个示例的主要区别在于我们使用CVTSI2SD而不是CVTSI2SS 。 指令中的SD意味着我们正在转换为标量双精度（64位双浮点数）。 类似地，我们使用标量双精度使用MULSD指令进行乘法运算。 XMM0将保持将返回到调用函数的64位标量double（double float）。

Answer 2

您可以将结果的地址作为参数传递：

main.c中：

#include<stdio.h>

extern "C" void funct(long long, float*);

int main ( void )
{

    float value1 = 0;           // float = DWORD ("double" would be QWORD)!
    funct(3, &value1);
    printf ("%f\n",value1);

    return 0;
}

callee.asm：

.data
    bla qword ?
    bla2 qword 10.0

.code
funct PROC
    push rbp
    mov rbp, rsp
    push rbx

    mov bla,rcx
    fild qword ptr[bla]         ; -> st(1)

    fld qword ptr [bla2]        ; -> st(0)
    fmul st(0), st(1)
    fstp dword ptr [rdx]        ; pop the first value
    ffree st(0)                 ; pop the second value

    pop rbx
    pop rbp
    ret
funct ENDP

END

从使用x87 FPU的64位汇编函数返回一个浮点数

问题描述

2 个解决方案

解决方案1
3 已采纳 2016-01-19 00:19:26

使用x87 FPU的64位代码中的32位浮点数

使用SSE的64位代码中的32位浮点数

使用x87 FPU的64位代码中的64位双浮点数

使用SSE2的64位代码中的64位双浮点数

解决方案2
1 2016-01-17 09:27:20

从使用x87 FPU的64位汇编函数返回一个浮点数

问题描述

2 个解决方案

解决方案1 3 已采纳 2016-01-19 00:19:26

使用x87 FPU的64位代码中的32位浮点数

使用SSE的64位代码中的32位浮点数

使用x87 FPU的64位代码中的64位双浮点数

使用SSE2的64位代码中的64位双浮点数

解决方案2 1 2016-01-17 09:27:20

解决方案1
3 已采纳 2016-01-19 00:19:26

解决方案2
1 2016-01-17 09:27:20