简体   繁体   English

将int转换为float时在后台会发生什么

[英]what happens at background when convert int to float

I have some no understanding about how one can cast int to float, step by step? 我有一些不明白如何逐步将int转换为float? Assume I have a signed integer number which is in binary format. 假设我有一个二进制格式的带符号整数。 Moreover, I want cast it to float by hand. 而且,我想把它用手漂浮。 However, I can't. 但是,我不能。 Thus, CAn one show me how to do that conversion step by step? 因此,CAn一个人告诉我如何逐步进行转换?

I do that conversion in c, many times ? 我在c中进行多次转换? like; 喜欢;

  int a = foo ( );
  float f = ( float ) a ;

But, I haven't figure out what happens at background. 但是,我还没弄清楚背景会发生什么。 Moreover, To understand well, I want do that conversion by hand. 而且,为了更好地理解,我想手工完成转换。

EDIT: If you know much about conversion, you can also give information about for float to double conversion. 编辑:如果你对转换了解很多,你也可以提供有关浮动到双转换的信息。 Moreover, for float to int 而且,对于float到int

Floating point values (IEEE754 ones, anyway) basically have three components: 浮点值(IEEE754,无论如何)基本上有三个组成部分:

  • a sign s ; 标志s ;
  • a series of exponent bits e ; 一系列指数位e ; and
  • a series of mantissa bits m . 一系列尾数位m

The precision dictates how many bits are available for the exponent and mantissa. 精度决定了指数和尾数有多少位可用。 Let's examine the value 0.1 for single-precision floating point: 让我们检查单精度浮点的值0.1:

s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm    1/n
0 01111011 10011001100110011001101
           ||||||||||||||||||||||+- 8388608
           |||||||||||||||||||||+-- 4194304
           ||||||||||||||||||||+--- 2097152
           |||||||||||||||||||+---- 1048576
           ||||||||||||||||||+-----  524288
           |||||||||||||||||+------  262144
           ||||||||||||||||+-------  131072
           |||||||||||||||+--------   65536
           ||||||||||||||+---------   32768
           |||||||||||||+----------   16384
           ||||||||||||+-----------    8192
           |||||||||||+------------    4096
           ||||||||||+-------------    2048
           |||||||||+--------------    1024
           ||||||||+---------------     512
           |||||||+----------------     256
           ||||||+-----------------     128
           |||||+------------------      64
           ||||+-------------------      32
           |||+--------------------      16
           ||+---------------------       8
           |+----------------------       4
           +-----------------------       2

The sign is positive, that's pretty easy. 标志是积极的,这很容易。

The exponent is 64+32+16+8+2+1 = 123 - 127 bias = -4 , so the multiplier is 2 -4 or 1/16 . 指数为64+32+16+8+2+1 = 123 - 127 bias = -4 ,因此乘数为2 -41/16 The bias is there so that you can get really small numbers (like 10 -30 ) as well as large ones. 偏见是存在的,这样你就可以获得非常小的数字(如10 -30 )以及大数字。

The mantissa is chunky. 尾数很粗糙。 It consists of 1 (the implicit base) plus (for all those bits with each being worth 1/(2 n ) as n starts at 1 and increases to the right), {1/2, 1/16, 1/32, 1/256, 1/512, 1/4096, 1/8192, 1/65536, 1/131072, 1/1048576, 1/2097152, 1/8388608} . 它由1 (隐式基数)加上(对于所有这些位,每个值为1 /(2 n ),因为n1开始并向右增加), {1/2, 1/16, 1/32, 1/256, 1/512, 1/4096, 1/8192, 1/65536, 1/131072, 1/1048576, 1/2097152, 1/8388608}

When you add all these up, you get 1.60000002384185791015625 . 当你添加所有这些,你得到1.60000002384185791015625

When you multiply that by the 2 -4 multiplier, you get 0.100000001490116119384765625 , which is why they say you cannot represent 0.1 exactly as an IEEE754 float. 当你乘以2 -4乘数时,得到0.100000001490116119384765625 ,这就是为什么他们说你不能完全代表0.1作为IEEE754浮点数。

In terms of converting integers to floats, if you have as many bits in the mantissa (including the implicit 1), you can just transfer the integer bit pattern over and select the correct exponent. 整数转换为浮点数方面,如果尾数中包含尽可能多的位(包括隐式1),则只需传输整数位模式并选择正确的指数即可。 There will be no loss of precision. 不会有精度损失。 For example a double precision IEEE754 (64 bits, 52/53 of those being mantissa) has no problem taking on a 32-bit integer. 例如,双精度IEEE754(64位,其中52/53为尾数)对32位整数没有任何问题。

If there are more bits in your integer (such as a 32-bit integer and a 32-bit single precision float, which only has 23/24 bits of mantissa) then you need to scale the integer. 如果整数中有更多位(例如32位整数和32位单精度浮点数,只有23/24位尾数),则需要缩放整数。

This involves stripping off the least significant bits (rounding actually) so that it will fit into the mantissa bits. 这涉及剥离最低有效位(实际舍入),以使其适合尾数位。 That involves loss of precision of course but that's unavoidable. 这当然会导致精度损失,但这是不可避免的。


Let's have a look at a specific value, 123456789 . 我们来看看具体的值123456789 The following program dumps the bits of each data type. 以下程序转储每种数据类型的位。

#include <stdio.h>

static void dumpBits (char *desc, unsigned char *addr, size_t sz) {
    unsigned char mask;
    printf ("%s:\n  ", desc);
    while (sz-- != 0) {
        putchar (' ');
        for (mask = 0x80; mask > 0; mask >>= 1, addr++)
            if (((addr[sz]) & mask) == 0)
                putchar ('0');
            else
                putchar ('1');
    }
    putchar ('\n');
}

int main (void) {
    int intNum = 123456789;
    float fltNum = intNum;
    double dblNum = intNum;

    printf ("%d %f %f\n",intNum, fltNum, dblNum);
    dumpBits ("Integer", (unsigned char *)(&intNum), sizeof (int));
    dumpBits ("Float", (unsigned char *)(&fltNum), sizeof (float));
    dumpBits ("Double", (unsigned char *)(&dblNum), sizeof (double));

    return 0;
}

The output on my system is as follows: 我系统的输出如下:

123456789 123456792.000000 123456789.000000
integer:
   00000111 01011011 11001101 00010101
float:
   01001100 11101011 01111001 10100011
double:
   01000001 10011101 01101111 00110100 01010100 00000000 00000000 00000000

And we'll look at these one at a time. 我们将一次看一下这些。 First the integer, simple powers of two: 首先是整数,简单的两个幂:

   00000111 01011011 11001101 00010101
        |||  | || || ||  || |    | | +->          1
        |||  | || || ||  || |    | +--->          4
        |||  | || || ||  || |    +----->         16
        |||  | || || ||  || +---------->        256
        |||  | || || ||  |+------------>       1024
        |||  | || || ||  +------------->       2048
        |||  | || || |+---------------->      16384
        |||  | || || +----------------->      32768
        |||  | || |+------------------->      65536
        |||  | || +-------------------->     131072
        |||  | |+---------------------->     524288
        |||  | +----------------------->    1048576
        |||  +------------------------->    4194304
        ||+---------------------------->   16777216
        |+----------------------------->   33554432
        +------------------------------>   67108864
                                         ==========
                                          123456789

Now let's look at the single precision float. 现在让我们看一下单精度浮点数。 Notice the bit pattern of the mantissa matching the integer as a near-perfect match: 注意尾数匹配整数的位模式为近似完美匹配:

mantissa:       11 01011011 11001101 00011    (spaced out).
integer:  00000111 01011011 11001101 00010101 (untouched).

There's an implicit 1 bit to the left of the mantissa and it's also been rounded at the other end, which is where that loss of precision comes from (the value changing from 123456789 to 123456792 as in the output from that program above). 在尾数的左边有一个隐含的 1位,它也在另一端被舍入,这是精度损失的来源(值从123456789变为123456792就像上面程序的输出一样)。

Working out the values: 制定价值观:

s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm    1/n
0 10011001 11010110111100110100011
           || | || ||||  || |   |+- 8388608
           || | || ||||  || |   +-- 4194304
           || | || ||||  || +------  262144
           || | || ||||  |+--------   65536
           || | || ||||  +---------   32768
           || | || |||+------------    4096
           || | || ||+-------------    2048
           || | || |+--------------    1024
           || | || +---------------     512
           || | |+-----------------     128
           || | +------------------      64
           || +--------------------      16
           |+----------------------       4
           +-----------------------       2

The sign is positive. 标志是积极的。 The exponent is 128+16+8+1 = 153 - 127 bias = 26 , so the multiplier is 2 26 or 67108864 . 指数为128+16+8+1 = 153 - 127 bias = 26 ,因此乘数为2 2667108864

The mantissa is 1 (the implicit base) plus (as explained above), {1/2, 1/4, 1/16, 1/64, 1/128, 1/512, 1/1024, 1/2048, 1/4096, 1/32768, 1/65536, 1/262144, 1/4194304, 1/8388608} . 尾数是1 (隐式基数)加(如上所述), {1/2, 1/4, 1/16, 1/64, 1/128, 1/512, 1/1024, 1/2048, 1/4096, 1/32768, 1/65536, 1/262144, 1/4194304, 1/8388608} When you add all these up, you get 1.83964955806732177734375 . 当你添加所有这些,你得到1.83964955806732177734375

When you multiply that by the 2 26 multiplier, you get 123456792 , the same as the program output. 当你乘以2 26乘数时,得到123456792 ,与程序输出相同。

The double bitmask output is: 双位掩码输出是:

s eeeeeeeeeee mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
0 10000011001 1101011011110011010001010100000000000000000000000000

I am not going to go through the process of figuring out the value of that beast :-) However, I will show the mantissa next to the integer format to show the common bit representation: 打算通过找出那个野兽的价值的过程:-)但是,我在整数格式旁边显示尾数以显示公共位表示:

mantissa:       11 01011011 11001101 00010101 000...000 (spaced out).
integer:  00000111 01011011 11001101 00010101           (untouched).

You can once again see the commonality with the implicit bit on the left and the vastly greater bit availability on the right, which is why there's no loss of precision in this case. 您可以再次看到左侧隐含位的共性和右侧的更大位可用性,这就是为什么在这种情况下不会丢失精度的原因。


In terms of converting between floats and doubles, that's also reasonably easy to understand. 在浮点数和双打数之间的转换方面,这也很容易理解。

You first have to check the special values such as NaN and the infinities. 首先必须检查特殊值,如NaN和无穷大。 These are indicated by special exponent/mantissa combinations and it's probably easier to detect these up front ang generate the equivalent in the new format. 这些由特殊的指数/尾数组合表示,并且可能更容易检测到这些前置角度,以新格式生成等效物。

Then in the case where you're going from double to float, you obviously have less of a range available to you since there are less bits in the exponent. 那么在你从double到float的情况下,你显然可用的范围较小,因为指数中的位数较少。 If your double is outside the range of a float, you need to handle that. 如果你的双精度超出浮动范围,你需要处理它。

Assuming it will fit, you then need to: 假设它适合,那么你需要:

  • rebase the exponent (the bias is different for the two types). 重新指数(这两种类型的偏差是不同的)。
  • copy as many bits from the mantissa as will fit (rounding if necessary). 从尾数中复制尽可能多的位(如果需要,可以舍入)。
  • padding out the rest of the target mantissa (if any) with zero bits. 用零位填充剩余的目标尾数(如果有的话)。

Conceptionally this is quite simple. 从概念上讲,这很简单。 A float (in IEEE 754-1985) has the following representation: float (在IEEE 754-1985中)具有以下表示:

  • 1 bit sign 1位符号
  • 8 bits exponent (0 means denormalized numbers, 1 means -126, 127 means 0, 255 means infinity) 8位指数(0表示非规范化数字,1表示-126,127表示0,255表示无穷大)
  • 23 bits mantissa (the part that follows the "1.") 23位尾数(“1.”后面的部分)

So basically it's roughly: 所以基本上它大致是:

  • determine the sign and the magnitude of the number 确定数字的符号和大小
  • find the 24 most significand bits, properly rounded 找到24个最有效位,正确舍入
  • adjust the exponent 调整指数
  • encode these three parts into the 32 bits form 将这三个部分编码为32位形式

When implementing your own conversion, it's easy to test, since you can just compare the results to the builtin type conversion operator. 在实现您自己的转换时,它很容易测试,因为您只需将结果与内置类型转换运算符进行比较。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM