[英]What data is written out when a higher precision is used to display a number than the one supported by the format?
The IEEE 754 double precision floating point format has a binary precision of 53 bits, which translates into log10(2^53) ~ 16 significant decimal digits. IEEE 754双精度浮点格式的二进制精度为53位,可以转换为log10(2 ^ 53)〜16个有效十进制数字。
If the double precision format is used to store a floating point number in a 64 bit-long word in the memory, with 52 bits for the significand and 1 hidden bit, but a larger precision is used to output the number to the screen, what data is actually read from the memory and written to the output? 如果使用双精度格式将浮点数存储在内存中64位长的字中,其中有效位52位,隐藏位1位, 但是使用较大的精度将数字输出到屏幕,该怎么办?数据实际上是从内存中读取并写入输出的?
How can it even be read, when the total length of the word is 64 bit, does the read-from-memory operation on the machine just simply read more bits and interprets them as an addition to the significand of the number? 当单词的总长度为64位时,如何读取机器上的“从内存读取”操作是否只是读取更多位并将其解释为数字的有效位数呢?
For example, take the number 0.1. 例如,取数字0.1。 It does not have an exact binary floating point representation regardless of the precision used, because it has an indefinitely repeating binary floating point pattern in the significand.
无论使用哪种精度,它都没有精确的二进制浮点表示形式,因为它的有效位数具有无限重复的二进制浮点模式。
If 0.1 is stored with the double precision, and printed to the screen with the precision >16 like this in the C++ language: 如果以双精度存储0.1,并使用C ++语言以这样的精度> 16打印到屏幕:
#include <iostream>
#include <iomanip>
using namespace std;
int main()
{
double x = 0.1;
cout << setprecision(50) << "x= " << x << endl;
};
The output (on my machine at the point of execution), is: 输出(在执行时在我的机器上)是:
x = 0.1000000000000000055511151231257827021181583404541
x = 0.1000000000000000055511151231257827021181583404541
If the correct rounding is used with 2 guard bits and 1 sticky bits, can I trust the decimal values given by the first three non-zero binary floating point digits in the error 5.551115123125783e-17? 如果正确的舍入与2个保护位和1个粘性位一起使用,我是否可以相信错误5.551115123125783e-17中前三个非零二进制浮点数给出的十进制值?
Every binary fraction is exactly equal to some decimal fraction. 每个二进制分数都等于某个十进制分数。 If, as is usually the case,
double
is a binary floating point type, each double
number has an exactly equal decimal representation. 如果通常情况下
double
是二进制浮点类型,则每个double
数字都具有完全相等的十进制表示形式。
For what follows, I am assuming your system uses IEEE 754 64-bit binary floating point to represent double
. 对于以下内容,我假设您的系统使用IEEE 754 64位二进制浮点数来表示
double
。 That is not required by the standard, but is very common. 这不是标准要求的,但是很常见。 The closest number to
0.1
in that format has exact value 0.1000000000000000055511151231257827021181583404541015625 该格式中最接近
0.1
数字具有精确值0.1000000000000000055511151231257827021181583404541015625
Although this number has a lot of digits, it is exactly equal to 3602879701896397/2 55 . 尽管此数字有很多数字,但它完全等于3602879701896397/2 55 。 Multiplying both numerator and denominator by 5 55 converts it to a decimal fraction, while increasing the number of digits in the numerator.
分子和分母都乘以5 55会将其转换为小数,同时增加分子的位数。
One common approach, consistent with the result in the question, is to use round-to-nearest to the number of digits required by the format. 与问题的结果相一致的一种常用方法是对格式要求的位数进行舍入取整。 That will indeed give useful information about the rounding error on conversion of a string to
double
. 实际上,这将提供有关将字符串转换为
double
舍入错误的有用信息。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.