简体   繁体   中英

Java floating point clarification

I am reading Java puzzlers by Joshua Bloch. In puzzle 28, I am not able to understand following paragraph-

This works because the larger a floating-point value, the larger the distance between the value and its successor. This distribution of floating-point values is a consequence of their representation with a fixed number of significant bits. Adding 1 to a floating-point value that is sufficiently large will not change the value, because it doesn't "bridge the gap" to its successor.

  1. Why do larger floating point values have larger distances between their values and successors?
  2. In case of Integer , we add one to get the next Integer , but in case of float , how do we get next float value? If I have float value in IEEE-754 format, do I add 1 to the mantissa part to get next float?

Imagine a decimal-based format where you are only allowed to set the first 5 values (ie your mantissa is length 5). For small numbers you would be fine : 1.0000, 12.000, 125.00

But for larger numbers you would start having to truncate eg1113500. The next representable number would be 1113600 which is 100 larger. Any values in between just can't be represented in this format. If you were reading in a value in this range, you would have to truncate it - find the closest representation that matches, even if it is not exact.

The problem gets worse the larger the number is. If I reach 34567800000 then the next representable number will be 34567900000 which is a gap of 1000000 or one million. In this way, you can see that the difference between representations depends on the size.

At the other extreme, for small values 0.0001, the next representable value is 0.0002 so the gap is just 0.0001.

Floating point values have the same principle, but with a binary encoding (powers of two instead of powers of ten).

You can think of floating point as base-2 scientific notation. In floating point, you are limited to a fixed number of bits for the mantissa (aka significand ) and for the exponent. How many depends on whether you are using a float (24 bits) or a double (53 bits).

It's a little more familiar to think of base-10 scientific notation. Imagine that the mantissa is limited to an integer and is always represented by 3 significant digits. Now consider these two pairs of successive numbers in this representation:

  • 100 x 10 0 and 101 x 10 0 (100 and 101)
  • 100 x 10 1 and 101 x 10 1 (1000 and 1010)

Note that the distance (aka difference) between the numbers in the first pair is 1, while with the second pair it is 10. In both pairs, the mantissas differ by 1, which is the smallest difference there can be between integers, but the difference is scaled by the exponent. That's why larger numbers have bigger steps between them in floating point (your first question).

Regarding the second question, let's look at adding 1 (100 x 10 -2 ) to the number 1000 (100 x 10 1 ):

  • 100 x 10 1 + 100 x 10 -2 = 1001 x 10 0

but we are limited to only three significant digits in the mantissa, so the last number gets normalized (after rounding) to:

  • 100 x 10 1

which leaves us back at 1000. To change a floating point value, you need to add at least half the difference between that number and the next number; this minimum difference varies with the scale of the number.

Exactly the same kind of thing is going on with binary floating point. There are more details (eg, normalization, guard digits, implied radix point, implied bit), which you can read about in the excellent article What Every Computer Scientist Should Know About Floating-Point Arithmetic

  1. floating point numbers are represented as a combination of mantissa and exponent, where the value of the number is mantissa * 2^(exponent) so if we assume the mantissa is limited to 2 digits (to make things simpler) and you have the number 1.1 * 2^100 , which is very large, the "next" value would be 1.2 * 2^100 . so if youre doing mixed-scale calculations, 1.1*2^100 + 1 will be rounded back to 1.1*2^100 since there's not enough space in the mantissa to retain the accurate result.
  2. starting with java 6 you have a utility method Math.nextUp() and Math.nextAfter() that will allow you to "iterate" over all possible double/float values. before that you need to add +1 to the mantissa and possible take care of overflowing to get the next/prev values.

Although it does not explain the why, this sample code shows how to calculate the distance between a float and the next available float and gives an example for a large number. f and g should be Integer.MAX_VALUE apart but they are the same. And the next value is h , which is 1099511627776 larger.

float f = Long.MAX_VALUE;
System.out.println("f = " + new BigDecimal(f));
System.out.println("f bits = " + Float.floatToIntBits(f));
float g = f - Integer.MAX_VALUE;
System.out.println("g = f - Integer.MAX_VALUE = " + new BigDecimal(g));
System.out.println("g bits = " + Float.floatToIntBits(g));
System.out.println("f == g? " + (f == g));
float h = Float.intBitsToFloat(Float.floatToIntBits(f) + 1);
System.out.println("h = " + new BigDecimal(h));
System.out.println("h bits = " + Float.floatToIntBits(h));
System.out.println("h - f = " + new BigDecimal(h).subtract(new BigDecimal(f)));

outputs:

f = 9223372036854775808
f bits = 1593835520
g = f - Integer.MAX_VALUE = 9223372036854775808
g bits = 1593835520
f == g? true
h = 9223373136366403584
h bits = 1593835521
h - f = 1099511627776

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM