Different Truncation Results When Casting

Question

I'm having some some difficulty predicting how my C code will truncate results. Refer to the following:

float fa,fb,fc;
short ia,ib;

fa=160
fb=0.9;
fc=fa*fb;
ia=(short)fc;
ib=(short)(fa*fb);

The results are ia=144, ib=143.

I can understand the reasoning for either result, but I don't understand why the two calculations are treated differently. Can anyone refer me to where this behaviour is defined or explain the difference?

Edit: the results are compiled with MS Visual C++ Express 2010 on Intel core i3-330m. I get the same results on gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) under Virtual Box on the same machine.

Answer 1

The compiler is allowed to use more precision for a subexpression like fa*fb than it uses when assigning to a float variable like fc . So it's the fc= part which is very slightly changing the result (and happening to then make a difference in the integer truncation).

Answer 2

aschepler explained the mechanics of what's going on well, but the fundamental problem with your code is using a value which does not exist as a float in code that depends upon the value of its approximation in an unstable way. If you want to multiply by 0.9 (the actual number 0.9=9/10, not the floating point value 0.9 or 0.9f ) you should multiply by 9 then divide by 10, or forget about floating point types and use a decimal arithmetic library.

A cheap and dirty way around the problem, when the unstable points are isolated as in your example here, is to just add a value (typically 0.5) which you know will be larger than the error but smaller than the difference from the next integer before truncating.

Answer 3

This is compiler dependent. On mine (gcc 4.4.3) it produces the same result for both expressions, namely -144, probably because the identical expression is optimized away.

Others explained well what happened. In other words I would say that the differences probably happens because your compiler internally promotes floats to 80 bits fpu registers before performing the multiplication, then convert back either to float or to short.

If my hypothesis is true if you write ib = (short)(float)(fa * fb); you should get the same result than when casting fc to short.

Different Truncation Results When Casting

Question

3 answers

solution1
7 ACCPTED 2010-12-20 20:09:49

solution2
3 2010-12-20 20:14:15

solution3
0 2010-12-20 20:15:21

Different Truncation Results When Casting

Question

3 answers

solution1 7 ACCPTED 2010-12-20 20:09:49

solution2 3 2010-12-20 20:14:15

solution3 0 2010-12-20 20:15:21

solution1
7 ACCPTED 2010-12-20 20:09:49

solution2
3 2010-12-20 20:14:15

solution3
0 2010-12-20 20:15:21