简体   繁体   English

如何在x86-64上获取80位长双精度数的尾数作为int

[英]How to get the mantissa of an 80-bit long double as an int on x86-64

frexpl won't work because it keeps the mantissa as part of a long double. frexpl将不起作用,因为它将尾数保留为长双frexpl一部分。 Can I use type punning, or would that be dangerous? 我可以使用punning类型,还是危险? Is there another way? 还有另一种方法吗?

x86's float and integer endianness is little-endian, so the significand (aka mantissa) is the low 64 bits of an 80-bit x87 long double . x86的浮点数和整数字节序为little-endian,因此有效位数(即尾数)为80位x87 long double的低64位。

In assembly, you just load the normal way, like mov rax, [rdi] . 在汇编中,您只需按正常方式加载,例如mov rax, [rdi]

Unlike IEEE binary32 ( float ) or binary64 ( double ), 80-bit long double stores the leading 1 in the significand explicitly . 与IEEE binary32( float )或binary64( double )不同,80位长的double显式地将前1存储在有效位中。 (Or 0 for subnormal). (对于次标准为0 )。 https://en.wikipedia.org/wiki/Extended_precision#x86_extended_precision_format https://en.wikipedia.org/wiki/Extended_precision#x86_extended_precision_format

So the unsigned integer value (magnitude) of the true significand is the same as what's actually stored in the object-representation. 因此,有效有效位数的无符号整数值(幅度)与对象表示中实际存储的值相同。

If you want signed int , too bad; 如果要签名int ,那就太糟糕了; including the sign bit it would be 65 bits but int is only 32-bit on any x86 C implementation. 包括符号位在内,它将是65位,但是在任何x86 C实现中, int仅为32位。

If you want int64_t , you could maybe right shift by 1 to discard the low bit, making room for a sign bit. 如果要使用int64_t ,则可以右移1以丢弃低位,从而为符号位腾出空间。 Then do 2's complement negation if the sign bit was set, leaving you with a signed 2's complement representation of the significand value divided by 2. (IEEE FP uses sign/magnitude with a sign bit at the top of the bit-pattern) 然后,如果设置了符号位,则进行2的补码求反,使有效值的符号2的补码表示除以2。(IEEE FP使用符号/幅度,在位模式的顶部带有符号位)


In C/C++, yes you need to type-pun, eg with a union or memcpy . 在C / C ++中,是的,您需要打双关,例如,使用union或memcpy All C implementations on x86 / x86-64 that expose 80-bit floating point at all use a 12 or 16-byte type with the 10-byte value at the bottom. x86 / x86-64上所有暴露80位浮点数的所有C实现都使用12或16字节类型,底部10字节值。

Beware that MSVC uses long double = double , a 64-bit float, so check LDBL_MANT_DIG from float.h , or sizeof(long double) . 请注意,MSVC使用long double = double (64位浮点数),因此请从float.hsizeof(long double)检查LDBL_MANT_DIG All 3 static_assert() statements trigger on MSVC, so they all did their job and saved us from copying a whole binary64 double (sign/exp/mantissa) into our uint64_t . 所有3条static_assert()语句均在MSVC上触发,因此它们都完成了自己的工作,并使我们免于将整个binary64 double (sign / exp / mantissa)复制到我们的uint64_t

// valid C11 and C++11
#include <float.h>  // float numeric-limit macros
#include <stdint.h>
#include <assert.h>  // C11 static assert
#include <string.h>  // memcpy

// inline
uint64_t ldbl_mant(long double x)
{
    // we can assume CHAR_BIT = 8 when targeting x86, unless you care about DeathStation 9000 implementations.
    static_assert( sizeof(long double) >= 10, "x87 long double must be >= 10 bytes" );
    static_assert( LDBL_MANT_DIG == 64, "x87 long double significand must be 64 bits" );

    uint64_t retval;
    memcpy(&retval, &x, sizeof(retval));
    static_assert( sizeof(retval) < sizeof(x), "uint64_t should be strictly smaller than long double" ); // sanity check for wrong types
    return retval;
}

This compiles efficiently on gcc/clang/ICC (on Godbolt) to just one instruction as a stand-alone function (because the calling convention passes long double in memory). 它可以作为独立函数在gcc / clang / ICC (在Godbolt上)上高效地编译为一条指令(因为调用约定将long double的内存传递)。 After inlining into code with a long double in an x87 register, it will presumably compile to a TBYTE x87 store and an integer reload. 在x87寄存器中以long double精度插入代码后,可能会编译为TBYTE x87存储并进行整数重载。

## gcc/clang/ICC -O3 for x86-64
ldbl_mant:
  mov rax, QWORD PTR [rsp+8]
  ret

For 32-bit, gcc has a weird redundant-copy missed-optimization bug which ICC and clang don't have; 对于32位,gcc有一个奇怪的冗余副本错过优化错误,而ICC和clang则没有。 they just do the 2 loads from the function arg without copying first. 他们只是从函数arg进行2次加载,而不先复制。

# GCC -m32 -O3  copies for no reason
ldbl_mant:
  sub esp, 28
  fld TBYTE PTR [esp+32]            # load the stack arg
  fstp TBYTE PTR [esp]              # store a local
  mov eax, DWORD PTR [esp]
  mov edx, DWORD PTR [esp+4]        # return uint64_t in edx:eax
  add esp, 28
  ret

C99 makes union type-punning well-defined behaviour, and so does GNU C++. C99使联合类型操作严格定义为行为,GNU C ++也是如此。 I think MSVC defines it too. 我认为MSVC也定义了它。

But memcpy is always portable so that might be an even better choice, and it's easier to read in this case where we just want one element. 但是memcpy始终是可移植的,因此这可能是一个更好的选择,在这种情况下,我们只需要一个元素,就更容易阅读。

If you also want the exponent and sign bit, a union between a struct and long double might be good, except that padding for alignment at the end of the struct will make it bigger. 如果还需要指数和符号位,则结构与long double精度数之间的并集可能会很好,除了在结构末尾对齐时使用的填充会使其变大。 It's unlikely that there'd be padding after a uint64_t member before a uint16_t member, though. 不过,不太可能在uint64_t成员之后的uint16_t成员之后进行填充。 But I'd worry about :1 and :15 bitfields, because IIRC it's implementation-defined which order the members of a bitfield are stored in. 但是我会担心:1:15位域,因为IIRC是实现定义的,位域的成员以什么顺序存储。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM