为什么要为IEEE 754浮点格式使用除联合以外的任何内容？

Question

I have been studying ways to convert floating point (floats and doubles) to IEEE 754 for the purpose of creating routines to efficiently send/receive information across network connections. 我一直在研究将浮点数（浮点数和双精度数）转换为IEEE 754的方法，目的是创建例程以有效地跨网络连接发送/接收信息。 (Akin to the perl pack/unpack functions.) I have waded through the methods of creating the IEEE 754 representation via Lockless , technical-recipes.com , Bit Twiddling , Bitwizardry , Haskell.org (c++) and the like, but I do not understand why those methods are any faster/efficient/better than just using a union to get the conversion? （类似于perl的包/解压缩功能。）我已完成创建经由IEEE 754表示的方法涉水无锁， technical-recipes.com ， 位操作 ， Bitwizardry ， Haskell.org（C ++）等，但我不明白为什么这些方法比仅使用工会来获得转换更快，更高效/更好？ The union conversions involving integer/float or long/double seem like a far better way to let C take care of worrying about the sign, exponent and mantissa than doing it manually with shifts and rotations. 涉及整数/浮点数或长/双数的并集转换似乎是让C担心符号，指数和尾数的一种更好的方法，而不是手动进行移位和旋转。

For example, with bit twiddling, you can manually create the IEEE 754 representation with: 例如，通过位旋转，您可以使用以下方法手动创建IEEE 754表示形式：

/* 23 bits of float fractional data */
#define I2F_FRAC_BITS   23
#define I2F_MASK ((1 << I2F_FRAC_BITS) - 1)

/* Find the log base 2 of an integer (MSB) */
int
getmsb (uint32_t word)
{
    int r;
#ifdef BUILD_64
    union { uint32_t u[2]; double d; } t;  // temp
    t.u[__FLOAT_WORD_ORDER==LITTLE_ENDIAN] = 0x43300000;
    t.u[__FLOAT_WORD_ORDER!=LITTLE_ENDIAN] = word;
    t.d -= 4503599627370496.0;
    r = (t.u[__FLOAT_WORD_ORDER==LITTLE_ENDIAN] >> 20) - 0x3FF;
#else    
    while (word >>= 1)
    {
        r++;
    }
#endif  /* BUILD_64 */
    return r;
}

/* rotate to right */
inline uint32_t 
rotr (uint32_t value, int shift)
{  return (value >> shift) | (value << (sizeof (value) * CHAR_BIT - shift));  }

/* unsigned to IEEE 754 */
uint32_t
u2ieee (uint32_t x)
{
    uint32_t msb, exponent, fraction;


    if (!x) return 0;       /* Zero is special */
    msb = getmsb (x);       /* Get location of the most significant bit */
    fraction = rotr (x, (msb - I2F_FRAC_BITS) & 0x1f) & I2F_MASK;
    exponent = (127 + msb) << I2F_FRAC_BITS;

    return fraction + exponent;
}

/* signed int to IEEE 754 */
uint32_t i2ieee (int32_t x)
{
        if (x < 0)
            return u2ieee (-x) | 0x80000000;
        return u2ieee (x);
}

At that point you can convert it to a hex or binary string, put it an a packet and reverse the process on the other end. 此时，您可以将其转换为十六进制或二进制字符串，将其放入数据包中，然后在另一端进行反向处理。 (Note, this is just for the 32bit case, similar functions are needed for 64 bit numbers.) Why do it this way? （请注意，这仅适用于32位，64位数字也需要类似的功能。）为什么要这样做呢？ Why not just put the float or double in a union which automatically stores in IEEE 754 representation, and then simply use int or long representation? 为什么不将float或double放入自动存储为IEEE 754表示形式的并集中，然后仅使用int或long表示形式呢？ It seems all cases could be handled by the following which seem much less error prone: 似乎所有情况都可以由以下方法处理，这些似乎不太容易出错：

union uif { int i; float f; };
union uid { long int i; double d; };

int
f2ieee (float f) {
    union uif cvt;
    cvt.f = f;
    return cvt.i;
}

float
ieee32f (int i) {
    union uif cvt;
    cvt.i = i;
    return cvt.f;
}

long
d2ieee64 (double d) {
    union uid cvt;
    cvt.d = d;
    return cvt.i;
}

double
ieee64d (long int i) {
    union uid cvt;
    cvt.i = i;
    return cvt.d;
}

All of this has been good learning, but I'm missing the most important piece of all. 所有这些都是很好的学习，但是我错过了最重要的部分。 Why do it one way instead of the other? 为什么以一种方式代替另一种方式？ What benefit is provided by manual conversion when simply reading from a union is much less error prone and on its face seems like it would be more efficient? 当简单地从一个联合中读取时错误发生的可能性要小得多并且从表面上看似乎会更有效时，手动转换有什么好处？ What say the experts? 专家怎么说？

Answer 1

Your suggested "simpler" code does not do the same thing as the code you propose to replace. 您建议的“简单”代码与您打算替换的代码没有相同的作用。 Your code is the correct way to convert a machine floating-point quantity (which conceivably might not be in IEEE format) to the same-size unsigned integer with the same representation . 您的代码是将机器浮点数（可能不是IEEE格式）转换为具有相同表示形式的相同大小的无符号整数的正确方法。 The "bit-twiddling" code you don't like is (if I understand it correctly) manually computing the IEEE-format floating point quantity with the same numeric value as a given integer. 您不喜欢的“位旋转”代码是（如果我理解正确的话）手动计算具有与给定整数相同的数值的IEEE格式浮点数量。 Both of these operations are useful, but in different contexts. 这两个操作都很有用，但是在不同的上下文中。 For instance, I'd expect to see your suggested code in the implementation of fpclassify on a CPU that has hardware IEEE floating point but no special instruction to classify values, and the "bit-twiddling" code in the implementation of a software floating-point library for a machine that doesn't have hardware floating point at all. 例如，我希望在具有硬件IEEE浮点但没有特殊的指令对值进行分类的CPU上的fpclassify实现中看到您建议的代码，而在软件浮点的实现中看到“位旋转”代码，完全没有硬件浮点的计算机的点库。

It is unsafe to use bit-fields to extract fields of a floating-point value, because the C standard says that the order in which bit-fields are packed into a struct is implementation-defined ( N1570 : 6.7.2.1p11), meaning that compilers can choose any ordering they like. 使用位字段提取浮点值的字段是不安全的 ，因为C标准指出，位字段打包到struct中的顺序是实现定义的 （ N1570 ：6.7.2.1p11），这意味着编译器可以选择他们喜欢的任何顺序。 They are supposed to document what they do, but they don't have to pick an ordering that "makes sense", and in particular, if you write a struct with bit-fields corresponding to the sign, exponent, and mantissa fields of an IEEE floating-point value, you can not rely cross-platform on those bit-fields lining up with the fields of an actual IEEE floating-point value. 他们应该记录自己的工作，但不必选择“有意义的”顺序，尤其是如果您编写的struct具有与符号的符号，指数和尾数字段相对应的位字段IEEE浮点值，您不能将跨平台依赖于与实际IEEE浮点值的字段对齐的那些位字段。 There really have been compilers that, for instance, packed bit-fields in the opposite direction from that expected by the target CPU's floating-point unit. 确实有一些编译器，例如，以与目标CPU的浮点单元所期望的方向相反的方向打包位域。

Now, in terms of the letter of the standard, this problem bites you worse if you use bit-shifts and masks to extract fields, because the value you get out of the conversion from a floating-point value to the same-size unsigned integer that you hope has the same representation is unspecified (N1570: 6.2.6.1p7), which is less nailed down than implementation-defined (but more nailed down than undefined). 现在，就标准的字母而言，如果您使用位移和掩码来提取字段，则此问题使您更难解决，因为从浮点值到相同大小的无符号整数的转换中得到的值您希望没有指定相同的表示形式（N1570：6.2.6.1p7），这比实现定义的要少（但比未定义的要多）。 However, in practice , doing it this way is much more likely to work. 但是， 实际上 ，以这种方式执行此操作的可能性更大。 (I can think of only one, thoroughly obsolete, context where it wouldn't work: some ARM-based systems in the early 1990s had third-party floating-point coprocessors that were big-endian, opposite to the main CPU's choice for integer values. In contrast, there have been dozens of compilers that used the "wrong" ordering for bit-fields; it has even been known to change upon minor upgrades.) （我只能想到一个完全过时的环境，在这种情况下它是行不通的：1990年代初一些基于ARM的系统具有第三方浮点协处理器，它们是big-endian的，这与主CPU选择整数相反相比之下，已有数十个编译器对位字段使用“错误的”顺序；甚至在进行较小的升级时就知道它会更改。）

(Have a look at Ada's "representation clauses" sometime, to see what it really takes to give the programmer the ability to align a record type with an external specification of the arrangement of bits in memory. C doesn't even come close.) （有时查看Ada的“ representation子句”，以了解使程序员能够将记录类型与内存中的位排列的外部规范对齐的真正能力。C甚至还差得远。）

(If all you want is to convert from an integer to a float with the same value , and you're not tasked with implementing the compiler back end, you do it by simple assignment: double x = 1123581321; Going the other way you're probably looking for lrint and its friends.) （如果您只想从整数转换为具有相同值的浮点数，而您并没有实现编译器后端的任务，则可以通过简单的赋值来实现： double x = 1123581321;一种方式，我可能正在寻找lrint及其朋友。）

为什么要为IEEE 754浮点格式使用除联合以外的任何内容？

问题描述

1 个解决方案

解决方案1
5 已采纳 2014-05-28 17:31:43

为什么要为IEEE 754浮点格式使用除联合以外的任何内容？

问题描述

1 个解决方案

解决方案1 5 已采纳 2014-05-28 17:31:43

解决方案1
5 已采纳 2014-05-28 17:31:43