简体   繁体   English

单个精度浮点上的第24个小数位在哪里? IEEE 754

[英]Where's the 24th fraction bit on a single precision float? IEEE 754

I found myself today doing some bit manipulation and I decided to refresh my floating-point knowledge a little! 我发现自己今天做了一些操作,我决定稍微刷新我的浮点知识!

Things were going great until I saw this: 事情都很顺利,直到我看到了这一点:

... 23 fraction bits of the significand appear in the memory format but the total precision is 24 bits ...有效数的23个小数位出现在存储器格式中,但总精度为24位

I read it again and again but I still can't figure out where the 24th bit is, I noticed something about a binary point so I assumed that it's a point in the middle between the mantissa and the exponent . 我一次又一次地阅读它,但我仍然无法弄清楚第24位的位置,我注意到有关binary point所以我认为它是mantissaexponent之间的中间点。

I'm not really sure but I believe he author was talking about this bit: 我不太确定,但我相信他的作者正在谈论这一点:

         Binary point?
             |
s------e-----|-------------m----------
0 - 01111100 - 01000000000000000000000
           ^ this

The 24 th bit is implicit due to normalization. 24位是由于归一化隐式的。

The significand is shifted left (and one subtracted from the exponent for each bit shift) until the leading bit of the significand is a 1. 有效数向左移位(并且每个位移从指数中减去一个)直到有效数的前导位为1。

Then, since the leading bit is a 1, only the other 23 bits are actually stored. 然后,由于前导位是1,所以实际上只存储了其他23位。

There is also the possibility of a denormal number. 还有可能存在非正规数。 The exponent is stored as a "bias" format signed number, meaning that it's an unsigned number where the middle of the range is defined to mean 0 1 . 指数存储为“偏差”格式有符号数,这意味着它是无符号数,其中范围的中间定义为0 1 So, with 8 bits, it's stored as a number from 0..255, but 0 is interpreted to mean -128, 128 is interpreted to mean 0, and 255 is interpreted as 127 (I may have a fencepost error there, but you get the idea). 因此,对于8位,它存储为0..255的数字,但0被解释为-128,128被解释为0,而255被解释为127(我可能有一个fencepost错误,但是你得到这个想法)。

If, in the process of normalization, this is decremented to 0 (meaning an actual exponent value of -128), then normalization stops, and the significand is stored as-is. 如果在归一化过程中,它递减到0(意味着实际指数值为-128),则归一化停止,并且有效数据按原样存储。 In this case, the implicit bit from normalization it taken to be a 0 instead of a 1. 在这种情况下,从归一化的隐含位取为0而不是1。

Most floating point hardware is designed to basically assume numbers will be normalized, so they assume that implicit bit is a 1. During the computation, they check for the possibility of a denormal number, and in that case they do roughly the equivalent of throwing an exception, and re-start the calculation with that taken into account. 大多数浮点硬件被设计成基本上假设数字将被规范化,因此它们假设隐含位是1.在计算期间,它们检查非正规数的可能性,并且在这种情况下它们大致相当于抛出异常,并重新开始计算。 This is why computation with denormals often gets drastically slower than otherwise . 这就是为什么使用非正规数计算通常会比其他情况慢得多的原因


  1. In case you wonder why it uses this strange format: IEEE floating point (like many others) is designed to ensure that if you treat its bit pattern as an integer of the same size, you can compare them as signed, 2's complement integers and they'll still sort into the correct order as floating point numbers. 如果你想知道它为什么使用这种奇怪的格式:IEEE浮点(像许多其他格式一样)旨在确保如果将其位模式视为相同大小的整数,则可以将它们作为带符号,2的补码整数和它们进行比较仍然会按照浮点数排序到正确的顺序。 Since the sign of the number is in the most significant bit (where it is for a 2's complement integer) that's treated as the sign bit. 由于数字的符号位于最高位(对于2的补码整数),它被视为符号位。 The bits of the exponent are stored as the next most significant bits -- but if we used 2's complement for them, an exponent less than 0 would set the second most significant bit of the number, which would result in what looked like a big number as an integer. 指数的位被存储为下一个最高有效位 - 但如果我们使用2的补码,小于0的指数将设置该数字的第二个最高位,这将导致看起来像一个大数字作为整数。 By using bias format, a smaller exponent leaves that bit clear, and a larger exponent sets it, so the order as an integer reflects the order as a floating point. 通过使用偏置格式,较小的指数使该位清零,并且较大的指数设置它,因此作为整数的顺序将顺序反映为浮点。

Normally (pardon the pun), the leading bit of a floating point number is always 1; 通常(原谅双关语),浮点数的前导位始终为1; thus, it doesn't need to be stored anywhere. 因此,它不需要存储在任何地方。 The reason is that, if it weren't 1, that would mean you had chosen the wrong exponent to represent it; 原因是,如果它不是1,那就意味着你选择了错误的指数代表它; you could get more precision by shifting the mantissa bits left and using a smaller exponent. 你可以通过将尾数位向左移动并使用较小的指数来获得更高的精度。

The one exception is denormal/subnormal numbers, which are represented by all zero bits in the exponent field (the lowest possible exponent). 一个例外是非正规/次正规数,它们由指数字段中的所有零位(最低可能指数)表示。 In this case, there is no implicit leading 1 in the mantissa, and you have diminishing precision as the value approaches zero. 在这种情况下,尾数中没有隐式前导1,并且当值接近零时,精度会降低。

For normal floating point numbers, the number stored in the floating point variable is (ignoring sign) 1. mantissa * 2 exponent-offset . 对于正常浮点数,存储在浮点变量中的数字是(忽略符号) 1. mantissa * 2 exponent-offset The leading 1 is not stored in the variable. 前导1不存储在变量中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 IEEE754浮点减法精度丢失 - IEEE754 float point substraction precision lost 十六进制浮点IEEE 754双精度c ++ - Hexadecimal to float IEEE 754 double precision c++ 如何在C ++中将十六进制转换为IEEE 754 32位浮点 - How to convert Hex to IEEE 754 32 bit float in C++ IEEE 754浮点数的有效数字 - significant digits with IEEE 754 float 32位,64位和80位浮点IEEE-754的可表示值范围? - Range of representable values of 32-bit, 64-bit and 80-bit float IEEE-754? 如何将float转换为double(都存储在IEEE-754表示中)而不会丢失精度? - How to convert float to double(both stored in IEEE-754 representation) without losing precision? 检查符号位的位置是否足以确定 IEEE-754 浮点数相对于 integer 字节序的字节序? - Is checking the location of the sign bit enough to determine endianness of IEEE-754 float with respect to integer endianness? 如何将-1x10 ^ 200转换为IEEE 754双精度 - How to convert -1x10^200 to IEEE 754 double precision 在 C++ 中将浮点数/双精度数转换为其 IEEE754 表示 - To convert a float / double to its IEEE754 representation in C++ IEEE-754的浮点数,双精度数和四进制数是否保证精确表示-2,-1,-0、0、1、2? - Does IEEE-754 float, double and quad guarantee exact representation of -2, -1, -0, 0, 1, 2?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM