简体   繁体   中英

Unexpected floating-point representations in Python

Hello I am using a dictionary in Python storing some cities and their population like that:

population = { 'Shanghai' : 17.8, 'Istanbul' : 13.3, 'Karachi' : 13.0, 'mumbai' : 12.5 }

Now if I use the command print population , I get the result:

{'Karachi': 13.0, 'Shanghai': 17.800000000000001, 'Istanbul': 13.300000000000001, 'mumbai': 12.5}

whereas if I use the command print population['Shanghai'] I get the initial input of 17.8 .

My question to you is how does the 17.8 and the 13.3 turned into 17.800000000000001 and 13.300000000000001 respectively? How was all that information produced? And why is it stored there, since my initial input denotes that I do not need that extra information, at least as far as I know.

This has been changed in Python 3.1. From the what's new page:

Python now uses David Gay's algorithm for finding the shortest floating point representation that doesn't change its value. This should help mitigate some of the confusion surrounding binary floating point numbers.

The significance is easily seen with a number like 1.1 which does not have an exact equivalent in binary floating point. Since there is no exact equivalent, an expression like float('1.1') evaluates to the nearest representable value which is 0x1.199999999999ap+0 in hex or 1.100000000000000088817841970012523233890533447265625 in decimal. That nearest value was and still is used in subsequent floating point calculations.

What is new is how the number gets displayed. Formerly, Python used a simple approach. The value of repr(1.1) was computed as format(1.1, '.17g') which evaluated to '1.1000000000000001' . The advantage of using 17 digits was that it relied on IEEE-754 guarantees to assure that eval(repr(1.1)) would round-trip exactly to its original value. The disadvantage is that many people found the output to be confusing (mistaking intrinsic limitations of binary floating point representation as being a problem with Python itself).

The new algorithm for repr(1.1) is smarter and returns '1.1' . Effectively, it searches all equivalent string representations (ones that get stored with the same underlying float value) and returns the shortest representation.

The new algorithm tends to emit cleaner representations when possible, but it does not change the underlying values. So, it is still the case that 1.1 + 2.2 != 3.3 even though the representations may suggest otherwise.

The new algorithm depends on certain features in the underlying floating point implementation. If the required features are not found, the old algorithm will continue to be used. Also, the text pickle protocols assure cross-platform portability by using the old algorithm.

(Contributed by Eric Smith and Mark Dickinson; issue 1580 )

You need to read up on how floating-point numbers work in computers.

Basically, not all decimal numbers are possible to store exactly, and in those cases you will get the closest possible number. Sometimes this abstraction leaks, and you get to see the error.

This is probably due to differences in the printing logic used for the two use-cases you describe. I couldn't re-produce the behavior (using Python 2.7.2 in Win64).

If you use a number that is exactly representable, such as 1.5 , I would guess the effect to go away.

You have to use decimal.Decimal if you want to have the decimal represented exactly as you specified it on any machine in the world.

See the Python manual for information: http://docs.python.org/library/decimal.html

>>> from decimal import Decimal
>>> print Decimal('3.14')
3.14

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM