A long time ago, I've used this simple x86 assembler trick to obtain 0 or 1 as a result of floating point number comparison:
fld [value1]
fcom [value2]
fnstsw ax
mov al, ah
and eax, 1
This trick allows to avoid branching if comparison result only affects selection of a value from a set of 2 values. It was fast in Pentium days, now it may not be so much faster, but who knows.
Now I mainly use C++ and compile using Intel C++ Compiler or GCC C++ Compiler.
Can someone please help rewrite this code into 2 built-in assembler flavors (Intel and GCC).
The required function prototype is: inline int compareDoublesIndexed( const double value1, const double value2 ) { ... }
Maybe using SSE2 operations could be even more efficient. Your perspective?
I've tried this:
__asm__(
"fcomq %2, %0\n"
"fnstsw %ax\n"
"fsubq %2, %0\n"
"andq $L80, %eax\n"
"shrq $5, %eax\n"
"fmulq (%3,%eax), %0\n"
: "=f" (penv)
: "0" (penv), "F" (env), "r" (c)
: "eax" );
But I get error in Intel C++ Compiler: Floating point output constraint must specify a single register.
As you mentioned, things have changed since the Pentium days:
Therefore first check what the compiler generates, you might be pleasantly surprised. I tried g++ with -O3
on the following code
fcmp.cpp:
int compareDoublesIndexed( const double value1, const double value2 ) {
return value1 < value2 ? 1 : 0;
}
This is what the compiler generated
0000000000400690 <_Z21compareDoublesIndexeddd>:
400690: 31 c0 xor %eax,%eax
400692: 66 0f 2e c8 ucomisd %xmm0,%xmm1
400696: 0f 97 c0 seta %al
400699: c3 retq
This is what it means
xor %eax,%eax ; EAX = 0
ucomisd %xmm0,%xmm1 ; compare value2 (in %xmm1) with value1 (in %xmm0)
seta %al ; AL = value2 > value1 ? 1 : 0
So the compiler avoided the conditional branch by using the seta
instruction (set byte to '1' if result is above, to '0' otherwise).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.