简体   繁体   中英

What data is written out when a higher precision is used to display a number than the one supported by the format?

The IEEE 754 double precision floating point format has a binary precision of 53 bits, which translates into log10(2^53) ~ 16 significant decimal digits.

If the double precision format is used to store a floating point number in a 64 bit-long word in the memory, with 52 bits for the significand and 1 hidden bit, but a larger precision is used to output the number to the screen, what data is actually read from the memory and written to the output?

How can it even be read, when the total length of the word is 64 bit, does the read-from-memory operation on the machine just simply read more bits and interprets them as an addition to the significand of the number?

For example, take the number 0.1. It does not have an exact binary floating point representation regardless of the precision used, because it has an indefinitely repeating binary floating point pattern in the significand.

If 0.1 is stored with the double precision, and printed to the screen with the precision >16 like this in the C++ language:

#include <iostream> 
#include <iomanip> 

using namespace std;

int main()
{
    double x = 0.1; 
    cout << setprecision(50) << "x= " << x << endl;
}; 

The output (on my machine at the point of execution), is:

x = 0.1000000000000000055511151231257827021181583404541

If the correct rounding is used with 2 guard bits and 1 sticky bits, can I trust the decimal values given by the first three non-zero binary floating point digits in the error 5.551115123125783e-17?

Every binary fraction is exactly equal to some decimal fraction. If, as is usually the case, double is a binary floating point type, each double number has an exactly equal decimal representation.

For what follows, I am assuming your system uses IEEE 754 64-bit binary floating point to represent double . That is not required by the standard, but is very common. The closest number to 0.1 in that format has exact value 0.1000000000000000055511151231257827021181583404541015625

Although this number has a lot of digits, it is exactly equal to 3602879701896397/2 55 . Multiplying both numerator and denominator by 5 55 converts it to a decimal fraction, while increasing the number of digits in the numerator.

One common approach, consistent with the result in the question, is to use round-to-nearest to the number of digits required by the format. That will indeed give useful information about the rounding error on conversion of a string to double .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM