简体   繁体   中英

Can anyone explain this floating point weirdness to me?

I was trying to loop through all possible values of a float like this:

float i = 0.0F;
float epsilon = float.Epsilon;
while (i != float.MaxValue) {
    i += epsilon;
}

but after reaching the value 2.3509887E-38F it stops increasing.

float init = 2.3509887E-38F;
float f = (init + float.Epsilon);
Console.WriteLine(f == init);

I'm just curious, can anyone explain exactly why?

So, I can add epsilon to a float 16777216 times before the rounding error, and that number looks awfully familiar (2^24).

Floating point numbers are imprecise; they can only hold so many significant digits, and will simply ignore values deemed 'too insignificant' if needed to store their current value.

The key is the 'floating' part of the name; The variable lets the 'point' float to wherever it is needed to store the value, meaning a floating point variable could store a very large or a very precise value, since it can 'move' the point where ever it needs to. But it usually can't store a value that is both large and precise.

But 'large' simplifies it too much; Any number which has a lot of significant numeric values higher up won't be able to store too much of a precise value. Since you are trying to add something so very small, you are likely to lose the ability to handle such precision very quickly.

If you took a very large value, you could find that even adding/subtracting whole numbers would still result in no change.

EDIT: See Stephen Canon's answer for a more precise answer, too. ;)

There's a lot of very wooly thinking here. Floating point numbers are not "imprecise". There is no "may". It's a deterministic system, like anything else on a computer.

Don't to analyze what's going on by looking at decimal representations. The source of this behavior is completely obvious if you look at these numbers in binary or hexadecimal. Let's use binary:

float.Epsilon is b1.0 x 2^-149
2.3509887E-38 is b1.0 x 2^-125

If we add these two numbers together, the infinitely precise (unrounded) sum is:

b1.000 0000 0000 0000 0000 0000 1 x 2^-125

Note that the significand of this sum is 25 bits wide (I've grouped the binary digits into sets of four to make them easier to count). This means that it cannot be represented in single-precision, so the result of this sum is not this value, but instead this value rounded to the closes representable float . The two closest representable numbers are:

b1.000 0000 0000 0000 0000 0000 x 2^-125
b1.000 0000 0000 0000 0000 0001 x 2^-125

Our number is exactly halfway in between them. Since you haven't set the rounding mode in your program, we are in the default rounding mode, which is called "round to nearest, ties to even". Because the two options are equally close, the tie is broken by choosing the one whose lowest-order bit is zero. Thus, 2^-125 + 2^-149 is rounded to 2^-125, which is why "it stops increasing".

Because epsilon (1.401298E-45) is too small compared to 2.3509887E-38F and when added the two together there're not enough bits in float to represent the sum exactly and the entire epsilon is lost.

Floating-point math on computers doesn't work the way we're taught math at school because numbers here are represented with a finite number of bits, which restricts your math to a certain range of values (minimum and maximum) and certain limited precision (number of digits in mantissa).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM