简体   繁体   English

浮点数比较技巧:内联汇编

[英]Floating point number comparison trick: inline assembly

A long time ago, I've used this simple x86 assembler trick to obtain 0 or 1 as a result of floating point number comparison: 很久以前,由于浮点数比较的结果,我使用了这个简单的x86汇编技巧,以获得0或1:

fld [value1]
fcom [value2]
fnstsw ax
mov al, ah
and eax, 1

This trick allows to avoid branching if comparison result only affects selection of a value from a set of 2 values. 如果比较结果仅影响从一组2个值中选择一个值,则此技巧可以避免分支。 It was fast in Pentium days, now it may not be so much faster, but who knows. 在Pentium时代速度很快,现在可能没有那么快,但是谁知道呢。

Now I mainly use C++ and compile using Intel C++ Compiler or GCC C++ Compiler. 现在,我主要使用C ++,并使用Intel C ++编译器或GCC C ++编译器进行编译。

Can someone please help rewrite this code into 2 built-in assembler flavors (Intel and GCC). 有人可以帮忙将此代码重写为2种内置汇编程序样式(Intel和GCC)。

The required function prototype is: inline int compareDoublesIndexed( const double value1, const double value2 ) { ... } 所需的函数原型为:内联int compareDoublesIndexed(const double value1,const double value2){...}

Maybe using SSE2 operations could be even more efficient. 也许使用SSE2操作可能更加有效。 Your perspective? 你的观点?


I've tried this: 我已经试过了:

__asm__(
    "fcomq %2, %0\n"
    "fnstsw %ax\n"
    "fsubq %2, %0\n"
    "andq $L80, %eax\n"
    "shrq $5, %eax\n"
    "fmulq (%3,%eax), %0\n"
    : "=f" (penv)
    : "0" (penv), "F" (env), "r" (c)
    : "eax" );

But I get error in Intel C++ Compiler: Floating point output constraint must specify a single register. 但是我在Intel C ++编译器中遇到错误:浮点输出约束必须指定一个寄存器。

As you mentioned, things have changed since the Pentium days: 正如您提到的,自奔腾时代以来,情况已经发生了变化:

  • SSE is now the preferred instruction set for floating point instead of x87, even for scalar operations 现在,即使对于标量运算,SSE是浮点而不是x87的首选指令集
  • optimizing compilers are now very good 优化编译器现在非常好

Therefore first check what the compiler generates, you might be pleasantly surprised. 因此,首先检查编译器生成的内容,您可能会感到惊喜。 I tried g++ with -O3 on the following code 我在以下代码上用-O3尝试了g ++

fcmp.cpp: fcmp.cpp:

int compareDoublesIndexed( const double value1, const double value2 ) {
    return value1 < value2 ? 1 : 0;
}

This is what the compiler generated 这就是编译器生成的

0000000000400690 <_Z21compareDoublesIndexeddd>:
  400690:       31 c0                   xor    %eax,%eax
  400692:       66 0f 2e c8             ucomisd %xmm0,%xmm1
  400696:       0f 97 c0                seta   %al
  400699:       c3                      retq   

This is what it means 这是什么意思

  xor     %eax,%eax        ; EAX = 0
  ucomisd %xmm0,%xmm1      ; compare value2 (in %xmm1) with value1 (in %xmm0)
  seta    %al              ; AL = value2 > value1 ? 1 : 0

So the compiler avoided the conditional branch by using the seta instruction (set byte to '1' if result is above, to '0' otherwise). 因此,编译器通过使用seta指令来避免条件分支(如果结果高于字节,则将字节设置为“ 1”,否则,将字节设置为“ 0”)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM