简体   繁体   中英

Floating point number comparison trick: inline assembly

A long time ago, I've used this simple x86 assembler trick to obtain 0 or 1 as a result of floating point number comparison:

fld [value1]
fcom [value2]
fnstsw ax
mov al, ah
and eax, 1

This trick allows to avoid branching if comparison result only affects selection of a value from a set of 2 values. It was fast in Pentium days, now it may not be so much faster, but who knows.

Now I mainly use C++ and compile using Intel C++ Compiler or GCC C++ Compiler.

Can someone please help rewrite this code into 2 built-in assembler flavors (Intel and GCC).

The required function prototype is: inline int compareDoublesIndexed( const double value1, const double value2 ) { ... }

Maybe using SSE2 operations could be even more efficient. Your perspective?


I've tried this:

__asm__(
    "fcomq %2, %0\n"
    "fnstsw %ax\n"
    "fsubq %2, %0\n"
    "andq $L80, %eax\n"
    "shrq $5, %eax\n"
    "fmulq (%3,%eax), %0\n"
    : "=f" (penv)
    : "0" (penv), "F" (env), "r" (c)
    : "eax" );

But I get error in Intel C++ Compiler: Floating point output constraint must specify a single register.

As you mentioned, things have changed since the Pentium days:

  • SSE is now the preferred instruction set for floating point instead of x87, even for scalar operations
  • optimizing compilers are now very good

Therefore first check what the compiler generates, you might be pleasantly surprised. I tried g++ with -O3 on the following code

fcmp.cpp:

int compareDoublesIndexed( const double value1, const double value2 ) {
    return value1 < value2 ? 1 : 0;
}

This is what the compiler generated

0000000000400690 <_Z21compareDoublesIndexeddd>:
  400690:       31 c0                   xor    %eax,%eax
  400692:       66 0f 2e c8             ucomisd %xmm0,%xmm1
  400696:       0f 97 c0                seta   %al
  400699:       c3                      retq   

This is what it means

  xor     %eax,%eax        ; EAX = 0
  ucomisd %xmm0,%xmm1      ; compare value2 (in %xmm1) with value1 (in %xmm0)
  seta    %al              ; AL = value2 > value1 ? 1 : 0

So the compiler avoided the conditional branch by using the seta instruction (set byte to '1' if result is above, to '0' otherwise).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM