浮点数比较技巧：内联汇编

Question

A long time ago, I've used this simple x86 assembler trick to obtain 0 or 1 as a result of floating point number comparison: 很久以前，由于浮点数比较的结果，我使用了这个简单的x86汇编技巧，以获得0或1：

fld [value1]
fcom [value2]
fnstsw ax
mov al, ah
and eax, 1

This trick allows to avoid branching if comparison result only affects selection of a value from a set of 2 values. 如果比较结果仅影响从一组2个值中选择一个值，则此技巧可以避免分支。 It was fast in Pentium days, now it may not be so much faster, but who knows. 在Pentium时代速度很快，现在可能没有那么快，但是谁知道呢。

Now I mainly use C++ and compile using Intel C++ Compiler or GCC C++ Compiler. 现在，我主要使用C ++，并使用Intel C ++编译器或GCC C ++编译器进行编译。

Can someone please help rewrite this code into 2 built-in assembler flavors (Intel and GCC). 有人可以帮忙将此代码重写为2种内置汇编程序样式（Intel和GCC）。

The required function prototype is: inline int compareDoublesIndexed( const double value1, const double value2 ) { ... } 所需的函数原型为：内联int compareDoublesIndexed（const double value1，const double value2）{...}

Maybe using SSE2 operations could be even more efficient. 也许使用SSE2操作可能更加有效。 Your perspective? 你的观点？

I've tried this: 我已经试过了：

__asm__(
    "fcomq %2, %0\n"
    "fnstsw %ax\n"
    "fsubq %2, %0\n"
    "andq $L80, %eax\n"
    "shrq $5, %eax\n"
    "fmulq (%3,%eax), %0\n"
    : "=f" (penv)
    : "0" (penv), "F" (env), "r" (c)
    : "eax" );

But I get error in Intel C++ Compiler: Floating point output constraint must specify a single register. 但是我在Intel C ++编译器中遇到错误：浮点输出约束必须指定一个寄存器。

Answer 1

As you mentioned, things have changed since the Pentium days: 正如您提到的，自奔腾时代以来，情况已经发生了变化：

SSE is now the preferred instruction set for floating point instead of x87, even for scalar operations 现在，即使对于标量运算，SSE是浮点而不是x87的首选指令集
optimizing compilers are now very good 优化编译器现在非常好

Therefore first check what the compiler generates, you might be pleasantly surprised. 因此，首先检查编译器生成的内容，您可能会感到惊喜。 I tried g++ with -O3 on the following code 我在以下代码上用-O3尝试了g ++

fcmp.cpp: fcmp.cpp：

int compareDoublesIndexed( const double value1, const double value2 ) {
    return value1 < value2 ? 1 : 0;
}

This is what the compiler generated 这就是编译器生成的

0000000000400690 <_Z21compareDoublesIndexeddd>:
  400690:       31 c0                   xor    %eax,%eax
  400692:       66 0f 2e c8             ucomisd %xmm0,%xmm1
  400696:       0f 97 c0                seta   %al
  400699:       c3                      retq

This is what it means 这是什么意思

  xor     %eax,%eax        ; EAX = 0
  ucomisd %xmm0,%xmm1      ; compare value2 (in %xmm1) with value1 (in %xmm0)
  seta    %al              ; AL = value2 > value1 ? 1 : 0

So the compiler avoided the conditional branch by using the seta instruction (set byte to '1' if result is above, to '0' otherwise). 因此，编译器通过使用seta指令来避免条件分支（如果结果高于字节，则将字节设置为“ 1”，否则，将字节设置为“ 0”）。

浮点数比较技巧：内联汇编

问题描述

1 个解决方案

解决方案1
2 已采纳 2014-03-09 13:17:56

浮点数比较技巧：内联汇编

问题描述

1 个解决方案

解决方案1 2 已采纳 2014-03-09 13:17:56

解决方案1
2 已采纳 2014-03-09 13:17:56