简体   繁体   中英

On a float rounding error

I do not understand the output of the following program:

int main()
{
    float  x     = 14.567729f;
    float  sqr   = x * x;
    float  diff1 = sqr - x * x;
    double diff2 = double(sqr) - double(x) * double(x);
    std::cout << diff1 << std::endl;
    std::cout << diff2 << std::endl;
    return 0;
}

Output:

6.63225e-006
6.63225e-006

I use VS2010, x86 compiler.

I expect to get a different output

0
6.63225e-006

Why diff1 is not equal to 0? To calculate sqr - x * x compiler increases float precision to double. Why?

The floating point registers are 80 bits (on most modern CPUs)

During an expression the result is an 80 bit value. It only gets truncated to 32 (float) or 64 (double) when it gets assigned to a location in memory. If you hold everything in registers (try compiling with -O3) you may see a different result.

Compiled with: -03:

> ./a.out
0
6.63225e-06
float  diff1 = sqr - x * x;
double diff2 = double(sqr) - double(x) * double(x);

Why diff1 is not equal to 0?

Because you have already cached sqr = x*x and forced its representation to be a float .

To calculate sqr - x * x compiler increases float precision to double. Why?

Because that is how C did things back before there was a C standard. I don't think modern compilers are bound to that convention, but many still do follow it. If this is the case, the right-hand sides of the calculations of diff1 and diff2 will be identical. The only difference is that after calculating the right-hand side of float diff1 = ... , the double result is converted back to a float.

Apparently the standard allows floats to be automatically promoted to double in expressions like that. See here

Do a find on that page for "automatically promoted" and check out the first paragraph with that phrase in it.

If we go by that paragraph, as I understand it, your sqr=x*x is initially being treated as if it were a double as well, but once it is stored it is being rounded to a float. Then, in your diff1=sqr-x*x, x*x is again being treated like a double, and so is sqr although it's already rounded. Therefore, it yields the same result as casting them all to doubles: sqr is a double then but already rounded to float precision, and again x*x is double precision.

On x86/x64 architectures it is common for compilers to promote all 32-bit floats to 64-bit doubles for computations; check the output assembly to see if the two variants produce the same instructions. The only difference between the types is the storage .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM