简体   繁体   English

在C ++中进行数学运算时,浮点误差如何传播?

[英]How does floating point error propagate when doing mathematical operations in C++?

Let's say that we have declared the following variables 假设我们已经声明了以下变量

float a = 1.2291;

float b = 3.99;

float variables have precision 6, which (if I understand correctly) means that the difference between the number that the computer actually stores and the actual number that you want will be less than 10^-6 float变量有精度6,(如果我理解正确的话)意味着计算机实际存储的数量与你想要的实际数量之间的差异将小于10^-6

that means that both a and b have some error that is less than 10^-6 这意味着ab都有一些小于10^-6错误

so inside the computer a could actually be 1.229100000012123 and b could be 3.9900000191919 所以,在电脑内的a实际上可能是1.229100000012123b可能是3.9900000191919

now let's say that you have the following code 现在让我们说你有以下代码

float c = 0;
for(int i = 0; i < 1000; i++)
      c += a + b;

my question is, 我的问题是,

will c 's final result have a precision error that is less than 10^-6 as well or not? c的最终结果是否也会产生小于10^-6的精度误差?

and if the answer is negative, how can we actually know this precision error and what exactly happens if you apply any kind of operations, as many times you wish and in any order? 如果答案是否定的,那么我们怎么能真正知道这个精确度错误以及如果您按照自己的意愿和任何顺序应用任何类型的操作会发生什么?

float variables have precision 6, which (if I understand correctly) means that the difference between the number that the computer actually stores and the actual number that you want will be less than 10^-6 浮点变量有精度6,(如果我理解正确的话)意味着计算机实际存储的数量与你想要的实际数量之间的差异将小于10 ^ -6

that means that both a and b have some error that is less than 10^-6 这意味着a和b都有一些小于10 ^ -6的错误

The 10 -6 figure is a rough measure of the relative accuracy when representing arbitrary constants as floats. 10 -6数字是将任意常数表示为浮点数时的相对精度的粗略度量。 Not all numbers will be represented with an absolute error of 10 -6 . 并非所有数字都以10 -6的绝对误差表示。 The number 8765432.1, for instance, can be expected to be represented approximately to the unit. 例如,可以预期数字8765432.1大约表示单位。 If you are at least a little bit lucky, you will get 8765432 when representing it as a float . 如果你至少有点幸运,那么当你将它表示为float时,你将获得8765432。 On the other hand, 1E-15f can be expected to be represented with an absolute error of at most about 10 -21 . 另一方面,可以预期1E-15f的绝对误差至多约为10 -21

so inside the computer a could actually be 1.229100000012123 and b could be 3.9900000191919 所以在计算机内部实际上可能是1.229100000012123而b可能是3.9900000191919

No, sorry, the way it works is not that you write the entire number and add six zeroes for the possible error. 不,对不起,它的工作方式不是你编写整个数字并为可能的错误添加六个零。 The error can be estimated by counting six zeroes from the leading digit, not from the last digit. 可以通过从前导数字计算六个零来估计误差,而不是从最后一个数字计算。 Here, you could expect 1.22910012123 or 3.990000191919. 在这里,你可以期待1.22910012123或3.990000191919。

(Actually you would get exactly 1.2290999889373779296875 and 3.9900000095367431640625. Don't forget that representation error can be negative as well as positive, as it is for the first number.) (实际上你会得到正好1.2290999889373779296875和3.9900000095367431640625。不要忘记表示错误可以是负数也可以是正数,因为它是第一个数字。)

now let's say that you have the following code […] 现在让我们说你有以下代码[...]

my question is, 我的问题是,

will c 's final result have a precision error that is less than 10^-6 as well or not? c的最终结果是否也会产生小于10 ^ -6的精度误差?

No. The total absolute error will be the sum of all the representation errors for a and b for each of the thousand times you used them, plus the errors of the 2000 additions you did. 不会。总绝对误差将是您使用它们的千次中每一次的ab的所有表示错误的总和,加上您所做的2000次加法的错误。 That's 4000 different sources of error! 这是4000种不同的错误来源! Many of them will be identical, some of them will happen to compensate each other, but the end result will probably not be to 10 -6 relative accuracy, more like 10 -5 relative accuracy (suggestion done without counting). 其中许多将是相同的,其中一些将碰巧相互补偿,但最终结果可能不会达到10 -6相对准确度,更像是相对准确度为10 -5 (建议不计算)。

This is a very good question and one that's been addressed for decades by many authorities and is a computer science discipline ( for example ) in itself. 这是一个非常好的问题,许多当局已经解决了这个问题,并且本身就是一个计算机科学学科( 例如 )。 From What Every Computer Scientist Should Know About Floating-Point Arithmetic : 每个计算机科学家应该知道的浮点运算

Floating-point arithmetic is considered an esoteric subject by many people. 浮点算术被许多人认为是一个深奥的主题。 This is rather surprising because floating-point is ubiquitous in computer systems. 这是相当令人惊讶的,因为浮点在计算机系统中无处不在。 Almost every language has a floating-point datatype; 几乎每种语言都有浮点数据类型; computers from PCs to supercomputers have floating-point accelerators; 从PC到超级计算机的计算机都有浮点加速器; most compilers will be called upon to compile floating-point algorithms from time to time; 大多数编译器都会被要求不时编译浮点算法; and virtually every operating system must respond to floating-point exceptions such as overflow. 几乎每个操作系统都必须响应溢出等浮点异常。 This paper presents a tutorial on those aspects of floating-point that have a direct impact on designers of computer systems. 本文提供了一个关于浮点的方面的教程,这些方面对计算机系统的设计者有直接影响。 It begins with background on floating-point representation and rounding error , continues with a discussion of the IEEE floating-point standard, and concludes with numerous examples of how computer builders can better support floating-point. 它首先介绍浮点表示和舍入误差 ,继续讨论IEEE浮点标准,最后总结了许多计算机构建器如何更好地支持浮点的例子。

(Emphasis mine) (强调我的)

The short answer is that you cannot easily determine the precision of a long chain of floating point operations. 简短的回答是,您无法轻松确定长链浮点运算的精度。

The precision of an operation like "c += a + b" depends not only on the raw precision of the floating point implementation (which these days almost always is IEEE), but also on the actual values of a,b and c. "c += a + b"这样的操作的精度不仅取决于浮点实现的原始精度(目前几乎总是IEEE),还取决于a,b和c的实际值。

Further to that the compiler may chose to optimize the code in different ways which can result in unexpected issues, like transforming it to "c+=a; c+=b;" 此外,编译器可能选择以不同方式优化代码,这可能导致意外问题,例如将其转换为"c+=a; c+=b;" or simply do the loop as "tmp = a*1000; tmp += b*1000; c += tmp;" 或者简单地将循环作为"tmp = a*1000; tmp += b*1000; c += tmp;" or some other variant which the compiler would determine resulting in faster execution time but the same result. 或者编译器将确定的一些其他变体导致更快的执行时间但是相同的结果。

Bottom line is that analysis of precision is not possible by inspecting source code alone. 最重要的是,仅通过检查源代码就无法进行精度分析。

For that reason many simply just uses a higher precision implementation like double or long-double and then checks that precision issues are gone for all practical purposes. 出于这个原因,许多人只是使用更高精度的实现,如double或long-double,然后检查精度问题是否已经用于所有实际目的。

If that does not suffice, then a fallback is always to implement all logic in integers and avoid floats. 如果这还不够,那么回退总是以整数实现所有逻辑并避免浮点数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM