[英]How come multiplication is as fast as addition for C++ double type values?
#include<vector>
#include<iostream>
#include<random>
#include<chrono>
int main()
{
int i;
std::mt19937 rng(std::chrono::system_clock::now().time_since_epoch().count());
std::uniform_real_distribution<double> dist(0.5, 1);
std::vector<double> q;
int N = 100000000;
for (i = 0; i < N; ++i) q.emplace_back(dist(rng));
double sum = 0;
auto start = std::chrono::steady_clock::now();
for (i = 1; i < 100000000; ++i) {
sum += q[i] + q[i - 1]; // change + to - or * or /, it takes same time.
}
auto end = std::chrono::steady_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << std::endl;
std::cout << sum << std::endl;
}
Addition and subtraction should be simple process, maybe some shifts and bitwise operations whose cost is proportional to precision.加法和减法应该是简单的过程,可能是一些移位和按位运算,其成本与精度成正比。
While multiplication and divisions are naturally more complicated process.而乘法和除法自然是更复杂的过程。 Say for multiplication, it seems natural for it to be magnitude slower (something like O(n^2) if addition takes O(n), as multiplication can be broken down into additions of shifted values) For division, it should be even harder.
说乘法,它的数量级要慢一些似乎很自然(如果加法需要 O(n),则类似于 O(n^2),因为乘法可以分解为移位值的加法)对于除法,它应该更难.
Yet for all 4 arithmetic operations using double type values, this code takes ~110ms, with optimization.然而,对于使用双精度类型值的所有 4 个算术运算,此代码需要大约 110 毫秒,并进行了优化。 How is this possible?
这怎么可能? What is magic going on here that allows C++ to handle multiplication as quickly as addition, ...or handle addition as slowly as multiplication?
这里发生了什么神奇的事情,让 C++ 能够像加法一样快地处理乘法,或者像乘法一样缓慢地处理加法?
ps for integer, it takes ~twice time, only for division. ps 对于 integer,它需要〜两倍的时间,仅用于除法。
On some processors, floating-point multiplication is as fast as addition because:在某些处理器上,浮点乘法与加法一样快,因为:
Nonetheless, you may see differences between the times of addition and multiplication.尽管如此,您可能会看到加法和乘法时间之间的差异。 Current processor designs are quite complicated, and processors typically have multiple units for doing various floating-point operations.
当前的处理器设计相当复杂,处理器通常具有多个单元来执行各种浮点运算。 A processor could have more units for doing addition than it does for doing multiplication, so it would be able to do more additions per unit of time than multiplications.
一个处理器可以有更多的单位来做加法而不是做乘法,所以它可以在单位时间内做比乘法更多的加法。
However, observe the expression you are using:但是,请注意您使用的表达式:
sum += q[i] + q[i - 1];
This causes sum
to be serially dependent on its prior value.这导致
sum
串行依赖于它的先前值。 The processor can add q[i]
to q[i-1]
without waiting for prior additions, but then, to add to sum
, it must wait for the prior add to sum
to complete.处理器可以将
q[i]
与q[i-1]
相加,而无需等待先前的加法,但是,要与sum
,它必须等待先前的与sum
完成。 This means that, if a processor has two units for addition, it could be working on both q[i] + q[i-1]
and the prior addition to sum
at the same time.这意味着,如果一个处理器有两个加法单元,它可以同时处理
q[i] + q[i-1]
和之前的sum
运算。 But, if it had more addition units, it could not go any faster.但是,如果它有更多的附加单元,它不能更快地 go。 It could use the extra units to do more of those
q[i] + q[i - 1]
additions for different values of i
, but every addition to sum
has to wait for the previous one.它可以使用额外的单元来为
i
的不同值做更多的q[i] + q[i - 1]
加法,但是sum
的每个加法都必须等待前一个加法。 Therefore, with two or more addition units, this computation is dependent on the latency of addition, which is how long it takes to do a single addition.因此,对于两个或更多加法单元,此计算取决于加法的延迟,即进行一次加法所需的时间。 (This is in contrast to the throughput of addition, which is how many additions the processor can do per unit of time, if there is no serial dependency.)
(这与加法的吞吐量相反,如果没有串行依赖,处理器在单位时间内可以进行多少次加法。)
If you used a different computation, such as sum += q[i];
如果您使用不同的计算,例如
sum += q[i];
or sum0 += q[i]; sum1 += q[i+1]; sum2 += q[i+2]; sum3 += q[i+3];
或
sum0 += q[i]; sum1 += q[i+1]; sum2 += q[i+2]; sum3 += q[i+3];
sum0 += q[i]; sum1 += q[i+1]; sum2 += q[i+2]; sum3 += q[i+3];
, then you could see different times for addition and multiplication that depended on how many addition units and how many multiplication units the processor had. ,然后您可以看到不同的加法和乘法时间,具体取决于处理器有多少个加法单元和多少个乘法单元。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.