Rounding error using the floor function in C++

Question

I was asked what will be the output of the following code:

floor((0.7+0.6)*10);

It returns 12.

I know that the floating point representation does not allow to represent all numbers with infinite precision and that I should expect some discrepancies.

My questions are:

How should I know that this piece of code returns 12, not 13? Why is (0.7+0.6)*10 a bit less than 13, not a bit more ?
When can I expect the floor function to work incorrectly and when it works correctly for sure?

Note: I'm not asking how floating representation looks like or why the output isn't exactly 13. I'd like to know how should I infer that (0.7+0.6)*10 is a bit less than 13.

Answer 1

How should I know that this piece of code returns 12, not 13? Why is (0.7+0.6)*10 a bit less than 13, not a bit more?

Assume that your compilation platform uses strictly the IEEE 754 standard formats and operations. Then, convert all the constants involved to binary, keeping 53 significant digits, and apply the basic operations, as defined in IEEE 754, by computing the mathematical result and rounding to 53 significant binary digits at each step. A computer does not need to be involved at any stage, but you can make your life easier by using C99's hexadecimal floating-point format for input and output.

When can I expect the floor function to work incorrectly and when it works correctly for sure?

floor() is exact for all positive arguments. It is working correctly in your example. The behavior that surprises you does not originate with floor and has nothing to do with floor . The surprising behavior starts with the fact that 6/10 and 7/10 are not representable exactly as binary floating-point values, and continues with the fact that since these values have long expansions, floating-point operations + and * can produce a slightly rounded result wrt the mathematical result you could expect from the arguments they are actually applied to. floor() is the only place in your code that does not involve approximation.

Example program to see what is happening:

#include <stdio.h>
#include <math.h>

int main(void) {
  printf("%a\n%a\n%a\n%a\n%a\n",
         0.7,
         0.6,
         0.7 + 0.6,
         (0.7+0.6)*10,
         floor((0.7+0.6)*10));
}

Result:

0x1.6666666666666p-1
0x1.3333333333333p-1
0x1.4ccccccccccccp+0
0x1.9ffffffffffffp+3
0x1.8p+3

IEEE 754 double-precision is really defined with respect to binary, but for conciseness the significand is written in hexadecimal. The exponent after p represents a power of two. For instance the last two results are both of the form <number roughly halfway between 1 and 2>*2 ³ .

0x1.8p+3 is 12. The next integer, 13, is 0x1.ap+3 , but the computation does not quite reach that value, and so the behavior of floor() is to round down to 12.

Answer 2

How should I know that this piece of code returns 12, not 13?

You should know that it can and may be either 12 or 13. You can verify by testing on a given cpu.

You can not know what the value will be, in general, because the C++ standard does not specify the representation of floating point numbers. If you know the format on given architecture (let's say IEEE 754), then you can perform the calculation by hand, but that result would only apply to that particular representation.

Why is (0.7+0.6)*10 a bit less than 13, not a bit more?

It's an implementation detail and not useful knowledge to the programmer. All you need to know that it may be either. Relying on the knowledge that it's one or the other, would make you depend on the implementation detail.

When can I expect the floor function to work incorrectly and when it works correctly for sure?

It always works correctly , that is accroding to how it's specified to work.

Now, speaking of the value that you are expecting to see. If you know that your number is very close to an integer, but might be off a little bit due to representation error, you can add 0.5 before flooring.

double calculated_integer = (0.7+0.6)*10;
floor(calculated_integer + 0.5);

That way, you will always get the expected value, unless the error exceeds 0.5 , which would be quite a big error.

If you don't know that the result should be an integer, then you simply have to accept the fact that floor and ceil operations increase the maximum error of your calculation to 1.0 .

Answer 3

How should I know that this piece of code returns 12, not 13?

Since that depends on the numbers involved, by trying .

Why is (0.7+0.6)*10 a bit less than 13, not a bit more?

Well, because that's the result of the calculation.

When can I expect the floor function to work incorrectly and when it works correctly for sure?

Correctly for sure: on multiples of powers of two only, iff your floating point number is represented in binary .

To really take all the confusion out of this:

You cannot know the result without calculating it ; it depends on both the machine/algorithmics involved and the numbers.

Answer 4

There are standard like the IEEE floating point standard which try to make floating point calculations at least a little bit predictive by defining rules how operations like additions and rounding should be implemented. To know the result, you need to compute the expression according to the standard rules. Then you can be sure, that it gives the same result on every machine, that implements the standard.

Answer 5

Very short answer: You can not. It depends on the platform and the float iso that is used on this platform.

Answer 6

In general, you can't. The fundamental problem is that the conversion from text representation to floating-point value is often not implemented as accurately as it could be. That's in part momentum, and in part because getting the floating-point value that's closest to the value expressed in text can be expensive, in some cases requiring large integer calculations. So conversions are often off by a few ULPs (ie, low-end bits) from the ideal value, in ways that you can't predict a priori . So the question of what that code will produce is unanswerable. The question of what it should produce may be a bit more tractable, but it's still an exercise in time-wasting.

Rounding error using the floor function in C++

Question

6 answers

solution1
4 2016-02-01 14:48:28

solution2
2 2016-02-01 14:25:11

solution3
1 2016-02-01 14:11:52

solution4
1 ACCPTED 2016-02-01 14:16:49

solution5
0 2016-02-01 14:12:14

solution6
0 2016-02-01 14:25:46

Rounding error using the floor function in C++

Question

6 answers

solution1 4 2016-02-01 14:48:28

solution2 2 2016-02-01 14:25:11

solution3 1 2016-02-01 14:11:52

solution4 1 ACCPTED 2016-02-01 14:16:49

solution5 0 2016-02-01 14:12:14

solution6 0 2016-02-01 14:25:46

solution1
4 2016-02-01 14:48:28

solution2
2 2016-02-01 14:25:11

solution3
1 2016-02-01 14:11:52

solution4
1 ACCPTED 2016-02-01 14:16:49

solution5
0 2016-02-01 14:12:14

solution6
0 2016-02-01 14:25:46