简体   繁体   中英

C - floating point rounding

I'm trying to understand how floating point numbers work.

I think I'd like to test out what I know / need to learn by evaluating the following: I would like to find the smallest x such that x + 1 = x , where x is a floating point number.

As I understand it, this would happen in the case where x is large enough so that x + 1 is closer to x than the next number higher than x representable by floating point. So intuitively it seems it would be the case where I don't have enough digits in the significand. Would this number x then be the number where the significand is all 1's. But then I can't seem to figure out what the exponent would have to be. Obviously it would have to be big (relative to 10^0, anyway).

You just need an expression for the value of the LS bit in the mantissa in terms of the exponent. When this is > 1 then you have met your condition. For a single precision float the LS bit has a value of 2^-24*2^exp, so the condition would me met when exp is > 24, ie 25. The smallest (normalized) number where this condition would be satisfied would therefore be 1.0 * 2^25 = 33554432.0f.

I haven't checked this, so my maths may be off somewhere (eg by a factor of 2) and it's also possible that the FP unit does rounding beyond the 24th bit, so there may be a further factor of 2 needed to account for this, but you get the general idea...

Start with 1.0, and keep doubling it until the test succeeds:

double x;
for (x = 1.0; x + 1 != x; x *= 2) { }
printf("%g + 1 = %g\n", x, x + 1);

I suggest that while trying to understand fp numbers and fp arithmetic you work in decimal with 5 digits in the significand and 2 in the exponent. (Or, if 5 and 2 don't suit you, 6 and 3 or any other small numbers you like.) The issues of:

  • the limited set of numbers which can be represented;
  • non-commutativity, non-associativity and non-distributivity;
  • the problems which can arise when treating fp numbers as real numbers;

are all much easier to figure out in decimal and the lessons you learn are entirely general. Once you've got this figured out, enhancing your knowledge with IEEE fp arithmetic will be relatively straightforward. You'll also be able to figure out other fp arithmetic systems with relative ease.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM