Floating point rounding

Question

I'm writing code in C++ and have defined PI as:

const double MathConstants::PI = atan(1.0)*4.0;

Else where in my code I set a float variable:

float result = (float) (-MathConstants::PI / 2.0);

When I print the value of result in memory I get DB 0F C9 BF little endian. Converted to big endian that is BF C9 0F DB.

According to http://babbage.cs.qc.cuny.edu/IEEE-754.old/Decimal.html that is equivalent to -1.5707964.

What I don't understand is why I get -1.5707964 instead of -1.5707963. -1.5707963 is the result I would expect since -PI/2 is -1.5707963267948966.

Can someone enlighten me here?

Answer 1

The rounding is binary, not decimal. When you look at the result in decimal it seems wrong, because you expect it to round the final decimal digit, but that's not what it's doing. It's rounding the final binary bit.

I typed 1.57079632679489661923132169163975 as the constant for PI/2 into the link you provided, and it gave the proper output of 1.5707964.

Compare the 64-bit representation to the 32-bit representation:

1 .1001001000011111101101010100010001000010110100011000
1 .10010010000111111011011

The part that was cut off started with 1, so proper rounding required the 32-bit result to be bumped up.

Floating point rounding

Question

1 answers

solution1
4 ACCPTED 2012-03-21 19:51:45

Floating point rounding

Question

1 answers

solution1 4 ACCPTED 2012-03-21 19:51:45

solution1
4 ACCPTED 2012-03-21 19:51:45