为什么一些算术运算需要比平常更多的时间？

Question

I've detected an unusual computational time when performing arithmetic operations with floating numbers of small precision. 我在使用小精度浮点数执行算术运算时检测到了不寻常的计算时间。 The following simple code exhibit this behavior: 以下简单代码表现出此行为：

#include <time.h>
#include <stdlib.h>
#include <stdio.h>

const int MAX_ITER = 100000000;

int main(int argc, char *argv[]){
    double x = 1.0, y;
    int i;
    clock_t t1, t2;
    scanf("%lf", &y);
    t1 = clock();
    for (i = 0; i < MAX_ITER; i++)
        x *= y;
    t2 = clock();
    printf("x = %lf\n", x);
    printf("Time: %.5lfsegs\n", ((double) (t2 - t1)) / CLOCKS_PER_SEC);
    return 0;
}

Here are two different runs of the program: 以下是该程序的两个不同运行：

With y = 0.5 y = 0.5

x = 0.000000 x = 0.000000
Time: 1.32000segs 时间：1.32000秒
With y = 0.9 y = 0.9

x = 0.000000 x = 0.000000
Time: 19.99000segs 时间：19.99000秒

I'm using a laptop with the following specs to test the code: 我正在使用具有以下规格的笔记本电脑来测试代码：

CPU : Intel® Core™2 Duo CPU T5800 @ 2.00GHz × 2 CPU ：英特尔®酷睿™2双核CPU T5800 @ 2.00GHz×2
RAM : 4 GB RAM ：4 GB
OS : Ubuntu 12.04 (64 bits) 操作系统 ：Ubuntu 12.04（64位）
Model : Dell Studio 1535 型号：戴尔Studio 1535

Could someone explain in detail why this behavior occurs? 有人可以详细解释为什么会出现这种情况吗？ I'm aware that with y = 0.9 the x value goes to 0 more slowly than with y = 0.5, so I suspect the problem is directly related to this. 我知道，当y = 0.9时，x值比y = 0.5更慢，所以我怀疑问题与此直接相关。

Answer 1

Denormal (or rather subnormal) numbers are often a performance hit. 非正规（或相当低于正常）数字通常会受到性能影响。 Slowly converging to 0 , per your second example, will generate more subnormals. 根据您的第二个示例，慢慢收敛到0将产生更多次正规。 Read more here and here . 在这里和这里阅读更多。 For more serious reading, check out the oft-cited (and very dense) What Every Computer Scientist Should Know About Floating-Point Arithmetic . 对于更严肃的阅读，请查看经常引用的（并且非常密集）每个计算机科学家应该知道的关于浮点运算的内容。

From the second source: 来自第二个来源：

Under IEEE-754, floating point numbers are represented in binary as: 在IEEE-754下，浮点数用二进制表示为：

Number = signbit \\* mantissa \\* 2exponent

There are potentially multiple ways of representing the same number, using decimal as an example, the number 0.1 could be represented as 1*10-1 or 0.1*100 or even 0.01 * 10. The standard dictates that the numbers are always stored with the first bit as a one. 有可能有多种表示相同数字的方式，使用小数作为示例，数字0.1可以表示为1 * 10-1或0.1 * 100或甚至0.01 * 10.标准规定数字始终存储在第一位作为一个。 In decimal that corresponds to the 1*10-1 example. 在十进制中，对应于1 * 10-1示例。

Now suppose that the lowest exponent that can be represented is -100. 现在假设可以表示的最低指数是-100。 So the smallest number that can be represented in normal form is 1*10-100. 因此，可以用正常形式表示的最小数字是1 * 10-100。 However, if we relax the constraint that the leading bit be a one, then we can actually represent smaller numbers in the same space. 但是，如果我们放宽前导位为1的约束，那么我们实际上可以在同一空间中表示较小的数字。 Taking a decimal example we could represent 0.1*10-100. 以十进制为例，我们可以表示0.1 * 10-100。 This is called a subnormal number. 这称为次正规数。 The purpose of having subnormal numbers is to smooth the gap between the smallest normal number and zero. 具有次正规数的目的是平滑最小正常数和零之间的差距。

It is very important to realise that subnormal numbers are represented with less precision than normal numbers. 认识到正常数字的精度低于正常数字是非常重要的。 In fact, they are trading reduced precision for their smaller size. 事实上，他们以较小的尺寸交易精度较低。 Hence calculations that use subnormal numbers are not going to have the same precision as calculations on normal numbers. 因此，使用次正规数的计算与正常数的计算不具有相同的精度。 So an application which does significant computation on subnormal numbers is probably worth investigating to see if rescaling (ie multiplying the numbers by some scaling factor) would yield fewer subnormals, and more accurate results. 因此，对次正规数进行重要计算的应用程序可能值得研究，以确定重新缩放（即将数字乘以某个缩放因子）将产生更少的次正规，以及更准确的结果。

I was thinking about explaining it myself, but the explanation above is extremely well written and concise. 我正在考虑自己解释它，但上面的解释是非常好的书面和简洁。

Answer 2

You get a measurable difference not because 0.9^n converges to 0 more slowly than 0.5^n mathematically, but because in IEEE-754 floating-point implementations, it doesn't converge to 0 at all. 你得到一个可测量的差异不是因为0.9^n在数学上比0.5^n更慢地收敛到0，而是因为在IEEE-754浮点实现中，它根本不会收敛到0。

The smallest positive double in IEEE-754 representation is 2 ^-1074 , the smallest positive normal is 2 ^-1021 , so with y = 0.5 , the loop encounters 53 subnormal numbers. IEEE-754表示中的最小正double ^精度为2 ^-1074 ，最小正正则为2 ^-1021 ，因此当y = 0.5 ，循环遇到53个次正规数。 Once the smallest positive subnormal is reached, the next product would be 2 ^-1075 , but due to the round-ties-to-last-bit-zero default rounding mode that is rounded to 0. (IEEE-754 representation of floating point numbers and the default round-ties-to-last-bit-zero rounding mode are pretty much ubiquitous on standard consumer hardware, even if the standard is not fully implemented.) From then on, you have a multiplication 0*y which is an ordinary floating point multiplication (that one's fast even if y is a subnormal number). 一旦达到最小的正次正规，下一个乘积将是2 ^-1075 ，但是由于舍入到最后一位零的默认舍入模式被舍入为0.（IEEE-754 表示浮点数并且默认的round-ties-to-last-bit-zero舍入模式在标准的消费者硬件上几乎无处不在，即使标准没有完全实现。）从那时起，你有一个0*y的乘法，这是一个普通的浮点乘法（即使y是次正规数，也是快速乘法）。

With 0.5 < y < 1 , once you've reached the lower end of the (positive) subnormal range, the result of x*y rounds to the value of x again (for y = 0.9 , the fixed point of the iteration is 5*2 ^-1074 ). 当0.5 < y < 1 ，一旦到达（正）次正规范围的下端， x*y的结果再次舍入到x的值（对于y = 0.9 ，迭代的固定点为5 * 2 ^-1074 ）。 Since that is reached after a few thousand iterations ( 0.9^7 < 0.5 ), you're basically multiplying a subnormal number with a nonzero number for the entire loop. 由于在几千次迭代（ 0.9^7 < 0.5 ）之后达到了这个值，因此基本上将一个次正规数乘以整数循环的非零数。 On many processors, such a multiplication can't be handled directly and has to be handled in microcode, which is a lot slower. 在许多处理器上，这样的乘法不能直接处理，必须在微码中处理，这要慢得多。

If speed is more important than IEEE-754 semantics (or if those are undesirable for other reasons), many compilers offer options to disable that behaviour and flush subnormal numbers to 0 if the hardware supports that. 如果速度比IEEE-754语义更重要（或者如果由于其他原因而不合需要），许多编译器会提供禁用该行为的选项，并在硬件支持时将次正常数字刷新为0。 I couldn't find an option for explicitly that in my gcc's man page, but -ffast-math did the trick here. 我在gcc的手册页中找不到明确的选项，但-ffast-math在这里做了诀窍。

为什么一些算术运算需要比平常更多的时间？

问题描述

2 个解决方案

解决方案1
10 已采纳 2012-09-12 17:44:57

解决方案2
3 2012-09-12 19:31:35

为什么一些算术运算需要比平常更多的时间？

问题描述

2 个解决方案

解决方案1 10 已采纳 2012-09-12 17:44:57

解决方案2 3 2012-09-12 19:31:35

解决方案1
10 已采纳 2012-09-12 17:44:57

解决方案2
3 2012-09-12 19:31:35