简体   繁体   中英

Can someone explain this floating-point behavior?

Inspired by this question , I was trying to find out what exactly happens there (my answer was more intuitive, but I cannot exactly understand the why of it).

I believe it comes down to this (running 64 bit Python):

>>> sys.maxint
9223372036854775807
>>> float(sys.maxint)
9.2233720368547758e+18

Python uses the IEEE 754 floating-point representation, which effectively has 53 bits for the significant. However, as far as I understand it, the significant in the above example would require 57 bits (56 if you drop the implied leading 1) to be represented. Can someone explain this discrepancy?

Perhaps the following will help clear things up:

>>> hex(int(float(sys.maxint)))
'0x8000000000000000L'

This shows that float(sys.maxint) is in fact a power of 2. Therefore, in binary its mantissa is exactly 1 . In IEEE 754 the leading 1. is implied, so in the machine representation this number's mantissa consists of all zero bits.

In fact, the IEEE bit pattern representing this number is as follows:

0x43E0000000000000

Observe that only the first three nibbles (the sign and the exponent) are non-zero. The significand consists entirely of zeroes. As such it doesn't require 56 (nor indeed 53) bits to be represented.

You're wrong. It requires 1 bit.

>>> (9.2233720368547758e+18).hex()
'0x1.0000000000000p+63'

When you convert sys.maxint to a float or double, the result is exactly 0x1p63, because the significand contains only 24 or 53 bits (including the implicit bit), so the trailing bits cause a round up. (sys.maxint is 2^63 - 1, and rounding it up produces 2^63.)

Then, when you print this float, some subroutine formats it as a decimal numeral. To do this, it calculates digits to represent 2^63. The fact that it is able to print 9.2233720368547758e+18 does not imply that the original number contains bits that would distinguish it from 9.2233720368547759e+18. It simple means that the bits in it do represent 9.2233720368547758e+18 (approximately). In fact, the next representable floating-point number in double precision is 9223372036854777856 (approximately 9.2233720368547778e+18), which is 2^63 + 2048. So the low 11 bits of these integers are not present in the double. The formatter merely displays the number as if those bits are zero.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM