[英]Return a float from a 64-bit assembly function that uses x87 FPU
I am trying to make a program that calculates equations (what equation doesn't matter currently) that use 64-bit registers, floats, and coprocessor instructions. 我正在尝试制作一个程序来计算使用64位寄存器,浮点数和协处理器指令的方程式(当前哪个方程式无关紧要)。 Unfortunately I don't know how to access the final outcome of the equation as a float.
不幸的是,我不知道如何以浮点形式获得等式的最终结果。 I can do:
我可以:
fist qword ptr [bla]
mov rax,bla
and change the function type to INT and get my value, but I cannot access it as a FLOAT. 并将函数类型更改为INT并获取我的值,但我无法将其作为FLOAT访问。 Even when I leave the result in ST(0) (the top of the coprocessor stack) it doesn't work as expected and my C++ program gets the wrong result.
即使我将结果保留在ST(0)(协处理器堆栈的顶部),它也无法按预期工作,并且我的C ++程序得到了错误的结果。 My assembly code is:
我的汇编代码是:
public funct
.data
bla qword ?
bla2 qword 10.0
.code
funct PROC
push rbp
mov rbp, rsp
push rbx
mov bla,rcx
fild qword ptr[bla]
fld qword ptr [bla2]
fmul st(0), st(1)
fist dword ptr [bla]
pop rbx
pop rbp
ret
funct ENDP
END
My C++ code is: 我的C ++代码是:
#include <stdlib.h>
#include <cstdlib>
#include <stdio.h>
extern "C" float funct(long long n);
int main(){
float value1= funct(3);
return 0;
}
What is the problem, and how can I fix it? 有什么问题,我该如何解决?
Your question is a bit ambiguous, and so is your code. 你的问题有点模棱两可,你的代码也是如此。 I'll present a few ideas using the x87 FPU, and SSE instructions.
我将使用x87 FPU和SSE指令提出一些想法。 The usage of x87 FPU instructions is discouraged in 64-bit code, and SSE / SSE2 is preferred.
在64位代码中不鼓励使用x87 FPU指令,首选SSE / SSE2 。 SSE / SSE2 are available on all 64-bit AMD and 64-bit Intel x86 processors.
SSE / SSE2适用于所有64位AMD和64位Intel x86处理器。
If your question is "How do I write 64-bit assembler code that uses 32-bit floats using the x87 FPU?" 如果您的问题是“如何使用x87 FPU编写使用32位浮点数的64位汇编程序代码?” then there your C++ code looks fine, but your assembler code needs some work.
然后你的C ++代码看起来很好,但你的汇编代码需要一些工作。 Your C++ code suggests the output type of the function is a 32-bit float:
您的C ++代码建议函数的输出类型是32位浮点数:
extern "C" float funct(long long n);
We need to create a function that returns a 32-bit float. 我们需要创建一个返回32位浮点数的函数。 Your assembler code could be modified in the following fashion.
您的汇编程序代码可以按以下方式修改。 I am keeping the stack frame code and the push/pop of RBX in your code, since I assume you were just giving us a minimal example and that your real code is using RBX .
我在你的代码中保留了堆栈帧代码和RBX的推/弹,因为我假设你只给了我们一个最小的例子,你真正的代码是使用RBX 。 With that in mind the following code should work:
考虑到这一点,以下代码应该工作:
public funct
.data
ten REAL4 10.0 ; Define variable ten as 32-bit (4-byte float)
; REAL4 and DWORD are both same size.
; REAL4 makes for more readable code when using floats
.code
funct PROC
push rbp
mov rbp, rsp ; Setup stack frame
; RSP aligned to 16 bytes at this point
push rbx
mov [rbp+16],rcx ; 32 byte shadow space is just above the return address
; at RBP+16 (this address is 16 byte aligned). Rather
; than use a temporary variable in the data section to
; store the value of RCX, we just store it to the
; shadow space on the stack.
fild QWORD ptr[rbp+16] ; Load and convert 64-bit integer into st(0)
fld [ten] ; st(0) => st(1), st(0) = 10.0
fmulp ; st(1)=st(1)*st(0), st(1) => st(0)
fstp REAL4 ptr [rbp+16] ; Store result to shadow space as 32-bit float
movss xmm0, REAL4 ptr [rbp+16] ; Store single scalar (32-bit float) to xmm0
; XMM0 = return value for 32(and 64-bit) floats
; in 64-bit code.
pop rbx
mov rsp, rbp ; Remove stack frame
pop rbp
ret
funct ENDP
END
I've commented the code, but the thing that might be of interest is that I don't use a second variable in the DATA section. 我已经对代码进行了评论,但可能感兴趣的是我不在DATA部分使用第二个变量。 The 64-bit Windows Calling Convention requires the caller of a function to ensure the stack is aligned on a 16-byte boundary and that there is a 32 byte shadow space (AKA register parameter area ) allocated before making a call.
64位Windows调用约定要求函数调用者确保堆栈在16字节边界上对齐,并且在进行调用之前分配了32字节阴影空间 (AKA 寄存器参数区域 )。 This area can be used as a scratch area.
该区域可用作划痕区域。 Since we set up a stack frame, RBP is at
RBP+0
, the return address is at RBP+8
and the scratch area starts at RBP+16
. 由于我们设置了堆栈帧,因此RBP为
RBP+0
,返回地址为RBP+8
,暂存区域为RBP+16
。 If you weren't using a stack frame then the return address is at RSP+0
, and the shadow space would start at RSP+8
We can store the result of our floating point operation there instead of in the QWORD you labelled bla . 如果你没有使用堆栈帧,则返回地址为
RSP+0
,阴影空间将从RSP+8
开始。我们可以在那里存储浮点运算的结果,而不是在标记为bla的QWORD中 。
It is a reasonable idea to unwind the floating point stack so nothing remains on it before we exit our function. 在我们退出函数之前,解开浮点堆栈是一个合理的想法,因此没有任何东西留在它上面。 I use the FPU floating point functions that pop the registers after we are done using them.
我使用FPU浮点函数在完成使用后弹出寄存器。
The 64-bit Microsoft calling convention requires floating point values to be returned in XMM0 . 64位Microsoft调用约定要求在XMM0中返回浮点值。 We use the SSE instruction MOVSS to move a scalar single (32-bit float) to the XMM0 register.
我们使用SSE指令MOVSS将标量单(32位浮点)移动到XMM0寄存器。 That is where the C++ code will expect that value to be returned.
这就是C ++代码期望返回值的地方。
Building on the ideas in the section above, we can modify the code to use SSE instructions with 32-bit floats. 基于上一节中的想法,我们可以修改代码以使用带有32位浮点数的SSE指令。 An example of such code is as follows:
这种代码的一个例子如下:
public funct
.data
ten REAL4 10.0 ; Define variable ten as 32-bit (4-byte float)
; REAL4 and DWORD are both same size.
; REAL4 makes for more readable code when using floats
.code
funct PROC
push rbp
mov rbp, rsp ; Setup stack frame
; RSP aligned to 16 bytes at this point
push rbx
cvtsi2ss xmm0, rcx ; Convert scalar integer in RCX to
; scalar single(float) and store in XMM0
mulss xmm0, [ten] ; 32-bit float multiply by 10.0 store in XMM0
; XMM0 = return value for 32(and 64-bit) floats
; in 64-bit code.
pop rbx
mov rsp, rbp ; Remove stack frame
pop rbp
ret
funct ENDP
END
This code removes the usage of the x87 FPU by using SSE instructions. 此代码使用SSE指令删除x87 FPU的用法。 In particular we use:
我们特别使用:
cvtsi2ss xmm0, rcx ; Convert scalar integer in RCX to
; scalar single(float) and store in XMM0
CVTSI2SS converts a scalar integer to a scalar single (float). CVTSI2SS将标量整数转换为标量单(浮点)。 In this case the 64-bit integer value in RCX is converted to a 32-bit float and stored in XMM0 .
在这种情况下, RCX中的64位整数值将转换为32位浮点并存储在XMM0中 。 XMM0 is the register we'll be placing our returned value into.
XMM0是我们将返回值放入的寄存器。 XMM0 to XMM5 are considered volatile so we don't need to save their values.
XMM0到XMM5被认为是volatile,因此我们不需要保存它们的值。
mulss xmm0, [ten] ; 32-bit float multiply by 10.0 store in XMM0
; XMM0 = return value for 32(and 64-bit) floats
; in 64-bit code.
MULSS is an SSE instruction that is used for SSE multiplication using scalar single (float). MULSS是一个SSE指令,用于使用标量单(浮点)进行SSE乘法。 In this case MULSS would do XMM0=XMM0*(32-bit float memory operand).
在这种情况下, MULSS将执行XMM0 = XMM0 *(32位浮点内存操作数)。 This would have the effect of doing 32-bit floating point multiply of XMM0 by the 32-bit float of 10.0.
这样可以使XMM0的32位浮点乘以32位浮点数10.0。 Since XMM0 also contains our final result we have nothing more to do but properly exit the function.
由于XMM0还包含我们的最终结果,因此除了正确退出函数之外,我们还有其他任何事情要做。
This is a variation on the first example, but now we are using 64-bit floats also known as the double
type in C++ , REAL8
(or QWORD
) in assembler, and a scalar double
in SSE2 . 这是第一个例子的变体,但现在我们使用的是64位浮点数,也称为C ++中的
double
类型,汇编程序中的REAL8
(或QWORD
),以及SSE2中的scalar double
。 Since we are now using double
as the return type we have to modify the C++ code to be: 由于我们现在使用
double
作为返回类型,我们必须将C ++代码修改为:
#include <stdlib.h>
#include <cstdlib>
#include <stdio.h>
extern "C" double funct(long long n);
int main() {
double value1 = funct(3);
return 0;
}
The assembly code would look like: 汇编代码如下所示:
public funct
.data
ten REAL8 10.0 ; Define variable ten as 64-bit (8-byte float)
; REAL8 and QWORD are both same size.
; REAL8 makes for more readable code when using floats
.code
funct PROC
push rbp
mov rbp, rsp ; Setup stack frame
; RSP aligned to 16 bytes at this point
push rbx
mov [rbp+16],rcx ; 32 byte shadow space is just above the return address
; at RBP+8 (this address is 16 byte aligned). Rather
; than use a temporary variable in the data section to
; store the value of RCX, we just store it to the
; shadow space on the stack.
fild QWORD ptr[rbp+16] ; Load and convert 64-bit integer into st(0)
fld [ten] ; st(0) => st(1), st(0) = 10.0
fmulp ; st(1)=st(1)*st(0), st(1) => st(0)
fstp REAL8 ptr [rbp+16] ; Store result to shadow space as 64-bit float
movsd xmm0, REAL8 ptr [rbp+16] ; Store double scalar (64-bit float) to xmm0
; XMM0 = return value for 32(and 64-bit) floats
; in 64-bit code.
pop rbx
mov rsp, rbp ; Remove stack frame
pop rbp
ret
funct ENDP
END
This code is nearly identical to the x87 code using 32-bit float. 此代码与使用32位浮点的x87代码几乎相同。 We are using REAL8 (same as QWORD ) to store a 64-bit float and use MOVSD to move a 64-bit double float (scalar double) to XMM0 .
我们使用REAL8 (与QWORD相同)来存储64位浮点数,并使用MOVSD将64位双浮点(标量双精度)移动到XMM0 。 MOVSD is an SSE2 instruction.
MOVSD是SSE2指令。 It is important to return the proper size float in XMM0 .
在XMM0中返回正确大小的浮点很重要。 Had you used MOVSS the value returned to the C++ function would likely be incorrect.
如果使用MOVSS ,返回到C ++函数的值可能不正确。
This is a variation on the second example, but now we are using 64-bit floats also known as the double
type in C++ , REAL8
(or QWORD
) in assembler, and a scalar double
in SSE2 . 这是第二个例子的变体,但现在我们使用64位浮点数,也称为C ++中的
double
类型,汇编程序中的REAL8
(或QWORD
),以及SSE2中的scalar double
。 The C++ code should use the code from the previous section so that double is used instead of float . C ++代码应使用上一节中的代码,以便使用double而不是float 。 The assembler code would be similar to this:
汇编程序代码与此类似:
public funct
.data
ten REAL8 10.0 ; Define variable ten as 64-bit (8-byte float)
; REAL8 and QWORD are both same size.
; REAL8 makes for more readable code when using floats
.code
funct PROC
push rbp
mov rbp, rsp ; Setup stack frame
; RSP aligned to 16 bytes at this point
push rbx
cvtsi2sd xmm0, rcx ; Convert scalar integer in RCX to
; scalar double(double float) and store in XMM0
mulsd xmm0, [ten] ; 64-bit float multiply by 10.0 store in XMM0
; XMM0 = return value for 32(and 64-bit) floats
; in 64-bit code.
pop rbx
mov rsp, rbp ; Remove stack frame
pop rbp
ret
funct ENDP
END
The primary difference from the second example is that we use CVTSI2SD instead of CVTSI2SS . 与第二个示例的主要区别在于我们使用CVTSI2SD而不是CVTSI2SS 。 SD in the instruction means we are converting to a scalar double (64-bit double float).
指令中的SD意味着我们正在转换为标量双精度(64位双浮点数)。 Similarly we use the MULSD instruction for multiplication using scalar doubles.
类似地,我们使用标量双精度使用MULSD指令进行乘法运算。 XMM0 will hold the 64-bit scalar double (double float) that will be returned to the calling function.
XMM0将保持将返回到调用函数的64位标量double(double float)。
You could pass the address of the result as parameter: 您可以将结果的地址作为参数传递:
main.c: main.c中:
#include<stdio.h>
extern "C" void funct(long long, float*);
int main ( void )
{
float value1 = 0; // float = DWORD ("double" would be QWORD)!
funct(3, &value1);
printf ("%f\n",value1);
return 0;
}
callee.asm: callee.asm:
.data
bla qword ?
bla2 qword 10.0
.code
funct PROC
push rbp
mov rbp, rsp
push rbx
mov bla,rcx
fild qword ptr[bla] ; -> st(1)
fld qword ptr [bla2] ; -> st(0)
fmul st(0), st(1)
fstp dword ptr [rdx] ; pop the first value
ffree st(0) ; pop the second value
pop rbx
pop rbp
ret
funct ENDP
END
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.