[英]C++ vs python numpy complex arrays performance
Can anyone tell me why these two programs have a huge difference in run time?谁能告诉我为什么这两个程序在运行时间上有巨大差异? I am simply multiplying two large complex arrays and comparing the time in python (numpy) and c++.
我只是将两个大型复数 arrays 相乘并比较 python(numpy)和 c++ 中的时间。 I am using the -O3 flag with g++ to compile this C++ code.
我正在使用带有 g++ 的 -O3 标志来编译这个 C++ 代码。 I find that the huge difference comes only when I use complex floats in C++, its more than 20 times faster in numpy.
我发现只有当我在 C++ 中使用复杂的浮点数时,才会出现巨大的差异,它在 numpy 中的速度要快 20 倍以上。
python code: python 代码:
import numpy as np
import time
if __name__ == "__main__":
# check the data type is the same
a = np.zeros((1), dtype=np.complex128)
a[0] = np.complex(3.4e38,3.5e38)
print(a)
b = np.zeros((1), dtype=np.complex64)
b[0] = np.complex(3.4e38,3.5e38)
print(b) # imaginary part is infinity
length = 5000;
A = np.ones((length), dtype=np.complex64) * np.complex(1,1)
B = np.ones((length), dtype=np.complex64) * np.complex(1,0)
num_iterations = 1000000
time1 = time.time()
for _ in range(num_iterations):
A *= B
time2 = time.time()
duration = ((time2 - time1)*1e6)/num_iterations
print(duration)
C++ code: C++ 代码:
#include <iostream>
#include <complex>
#include <chrono>
using namespace std::chrono;
using namespace std;
int main()
{
// check the data type is the same
complex<double> a = complex<double>(3.4e38, 3.5e38);
cout << a << endl;
complex<float> b = complex<float>(3.4e38, 3.5e38);
cout << b << endl; // imaginary part is infinity
const int length = 5000;
static complex<float> A[length];
static complex<float> B[length];
for(int i=0; i < length; i++) {
A[i] = complex<float>(1,1);
B[i] = complex<float>(1,0);
}
int num_iterations = 1000000;
auto time1 = high_resolution_clock::now();
for(int k=0; k < num_iterations; k++)
for(int i=0; i < length; i++)
A[i] *= B[i];
auto time2 = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(time2 - time1);
cout << "average time:" << duration.count() / num_iterations << endl;
}
The C++ compiler is doing some extra checking gymnastics for you in order to properly handle NaNs and other such "standard" behavior. C++ 编译器正在为您做一些额外的检查,以便正确处理 NaN 和其他此类“标准”行为。 If you add the
-ffast-math
optimization flag, you'll get more sane speed, but less "standard" behavior.如果您添加
-ffast-math
优化标志,您将获得更理智的速度,但更少的“标准”行为。 eg complex<float>(inf,0)*complex<float>(inf,0)
won't be evaluated as complex<float>(inf,0)
.例如
complex<float>(inf,0)*complex<float>(inf,0)
不会被评估为complex<float>(inf,0)
。 Do you really care?你真的在乎吗?
numpy is doing what makes sense, not hindered by a narrow reading of the C++ standard. numpy 正在做有意义的事情,不受 C++ 标准的狭隘阅读的阻碍。
eg until very recent g++ versions, the latter of the following functions is much faster unless -ffast-math
is used.例如,直到最近的 g++ 版本,除非使用
-ffast-math
,否则以下函数中的后者要快得多。
complex<float> mul1( complex<float> a,complex<float> b)
{
return a*b;
}
complex<float> mul2( complex<float> a,complex<float> b)
{
float * fa = reinterpret_cast<float*>(&a);
const float * fb = reinterpret_cast<float*>(&b);
float cr = fa[0]*fb[0] - fa[1]*fb[1];
float ci = fa[0]*fb[1] + fa[1]*fb[0];
return complex<float>(cr,ci);
}
You can experiment with this on https://godbolt.org/z/kXPgCh for the assembly output and how the former function defaults to calling __mulsc3
您可以在https://godbolt.org/z/kXPgCh上对此进行试验以获取程序集 output 以及前 functionsc 如何默认调用
__mulsc3
PS Ready for another wave of anger at what the C++ standard says about std::complex<T>
? PS 准备好对 C++ 标准对
std::complex<T>
的另一波愤怒了吗? Can you guess how std::norm must be implemented by default?你能猜出std::norm默认是如何实现的吗? Play along.
一起玩。 Follow the link and spend ten seconds thinking about it.
点击链接并花十秒钟思考它。
Spoiler: it probably is using a sqrt then squaring it.剧透:它可能是使用 sqrt 然后对其进行平方。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.