正弦计算比余弦慢几个数量级

Question

tl;dr TL;博士

Of the same numpy array, calculating np.cos takes 3.2 seconds, wheras np.sin runs 548 seconds (nine minutes) on Linux Mint. 相同的numpy阵列，计算np.cos需要3.2秒，wheras np.sin运行在Linux薄荷548秒（9分钟）。

See this repo for full code. 有关完整代码，请参阅此repo 。

I've got a pulse signal (see image below) which I need to modulate onto a HF-carrier, simulating a Laser Doppler Vibrometer . 我有一个脉冲信号（见下图）我需要调制到HF载波上，模拟激光多普勒振动计。 Therefore signal and its time basis need to be resampled to match the carrier's higher sampling rate. 因此，需要对信号及其时间基础进行重新采样，以匹配载波的较高采样率。

脉冲信号被调制到HF载波上

In the following demodulation process both the in-phase carrier cos(omega * t) and the phase-shifted carrier sin(omega * t) are needed. 在随后的解调过程中，需要同相载波cos(omega * t)和相移载波sin(omega * t) 。 Oddly, the time to evaluate these functions depends highly on the way the time vector has been calculated. 奇怪的是，评估这些功能的时间在很大程度上取决于计算时间向量的方式。

The time vector t1 is being calculated using np.linspace directly, t2 uses the method implemented in scipy.signal.resample . 时间向量t1直接使用np.linspace计算， t2使用scipy.signal.resample实现的方法。

pulse = np.load('data/pulse.npy')  # 768 samples

pulse_samples = len(pulse)
pulse_samplerate = 960  # 960 Hz
pulse_duration = pulse_samples / pulse_samplerate  # here: 0.8 s
pulse_time = np.linspace(0, pulse_duration, pulse_samples,
                         endpoint=False)

carrier_freq = 40e6  # 40 MHz
carrier_samplerate = 100e6  # 100 MHz
carrier_samples = pulse_duration * carrier_samplerate  # 80 million

t1 = np.linspace(0, pulse_duration, carrier_samples)

# method used in scipy.signal.resample
# https://github.com/scipy/scipy/blob/v0.17.0/scipy/signal/signaltools.py#L1754
t2 = np.arange(0, carrier_samples) * (pulse_time[1] - pulse_time[0]) \
        * pulse_samples / float(carrier_samples) + pulse_time[0]

As can be seen in the picture below, the time vectors are not identical. 从下图中可以看出，时间向量并不相同。 At 80 million samples the difference t1 - t2 reaches 1e-8 . 在8000万个样本中，差异t1 - t2达到1e-8 。

时间向量<code> t1 </ code>和<code> t2 </ code>之间的差异

Calculating the in-phase and shifted carrier of t1 takes 3.2 seconds each on my machine. 在我的机器上计算t1同相和移位载波各需要3.2秒 。
With t2 , however, calculating the shifted carrier takes 540 seconds . 但是，对于t2 ，计算移位的载波需要540秒 。 Nine minutes. 九分钟 For nearly the same 80 million values. 对于几乎相同的8000万个值。

omega_t1 = 2 * np.pi * carrier_frequency * t1
np.cos(omega_t1)  # 3.2 seconds
np.sin(omega_t1)  # 3.3 seconds

omega_t2 = 2 * np.pi * carrier_frequency * t2
np.cos(omega_t2)  # 3.2 seconds
np.sin(omega_t2)  # 9 minutes

I can reproduce this bug on both my 32-bit laptop and my 64-bit tower, both running Linux Mint 17 . 我可以在我的32位笔记本电脑和我的64位塔上重现这个错误，两者都运行Linux Mint 17 。 On my flat mate's MacBook, however, the "slow sine" takes as little time as the other three calculations. 然而，在我的平板伴侣的MacBook上，“慢速正弦”花费的时间与其他三个计算时间相同。

I run a Linux Mint 17.03 on a 64-bit AMD processor and Linux Mint 17.2 on 32-bit Intel processor. 我在64位AMD处理器上运行Linux Mint 17.03 ，在32位Intel处理器上运行Linux Mint 17.2 。

Answer 1

I don't think numpy has anything to do with this: I think you're tripping across a performance bug in the C math library on your system, one which affects sin near large multiples of pi. 我不认为numpy与此有任何关系：我认为你正在绊倒系统中C数学库中的性能错误，这会影响接近pi的大倍数的sin。 (I'm using "bug" in a pretty broad sense here -- for all I know, since the sine of large floats is poorly defined, the "bug" is actually the library behaving correctly to handle corner cases!) （我在这里广泛使用“bug” - 据我所知，由于大浮点数的正弦定义不明确，“bug”实际上是库正确处理极端情况的行为！）

On linux, I get: 在linux上，我得到：

>>> %timeit -n 10000 math.sin(6e7*math.pi)
10000 loops, best of 3: 191 µs per loop
>>> %timeit -n 10000 math.sin(6e7*math.pi+0.12)
10000 loops, best of 3: 428 ns per loop

and other Linux-using types from the Python chatroom report 以及Python聊天室报告中使用的其他Linux类型

10000 loops, best of 3: 49.4 µs per loop 
10000 loops, best of 3: 206 ns per loop

and 和

In [3]: %timeit -n 10000 math.sin(6e7*math.pi)
10000 loops, best of 3: 116 µs per loop

In [4]: %timeit -n 10000 math.sin(6e7*math.pi+0.12)
10000 loops, best of 3: 428 ns per loop

but a Mac user reported 但Mac用户报道

In [3]: timeit -n 10000 math.sin(6e7*math.pi)
10000 loops, best of 3: 300 ns per loop

In [4]: %timeit -n 10000 math.sin(6e7*math.pi+0.12)
10000 loops, best of 3: 361 ns per loop

for no order-of-magnitude difference. 没有数量级的差异。 As a workaround, you might try taking things mod 2 pi first: 作为一种解决方法，您可以先尝试使用mod 2 pi：

>>> new = np.sin(omega_t2[-1000:] % (2*np.pi))
>>> old = np.sin(omega_t2[-1000:])
>>> abs(new - old).max()
7.83773902468434e-09

which has better performance: 哪个有更好的表现：

>>> %timeit -n 1000 new = np.sin(omega_t2[-1000:] % (2*np.pi))
1000 loops, best of 3: 63.8 µs per loop
>>> %timeit -n 1000 old = np.sin(omega_t2[-1000:])
1000 loops, best of 3: 6.82 ms per loop

Note that as expected, a similar effect happens for cos , just shifted: 请注意，正如预期的那样， cos发生了类似的效果，只是移位：

>>> %timeit -n 1000 np.cos(6e7*np.pi + np.pi/2)
1000 loops, best of 3: 37.6 µs per loop
>>> %timeit -n 1000 np.cos(6e7*np.pi + np.pi/2 + 0.12)
1000 loops, best of 3: 2.46 µs per loop

Answer 2

One possible cause of these huge performance differences might be in how the math library creates or handles IEEE floating point underflow (or denorms), which might be produced by a difference of some of the tinier mantissa bits during transcendental function approximation. 造成这些巨大性能差异的一个可能原因可能是数学库如何创建或处理IEEE浮点下溢（或denorms），这可能是由于在超越函数逼近期间某些较小的尾数位的差异而产生的。 And your t1 and t2 vectors might differ by these smaller mantissa bits, as well as the algorithm used to compute the transcendental function in whatever libraries you linked, as well as the IEEE arithmetic denorms or underflow handler on each particular OS. 并且你的t1和t2向量可能因这些较小的尾数位而不同，以及用于计算链接的任何库中的超越函数的算法，以及每个特定OS上的IEEE算术denorms或下溢处理程序。

正弦计算比余弦慢几个数量级

问题描述

tl;dr TL;博士

2 个解决方案

解决方案1
18 已采纳 2016-03-05 18:56:22

解决方案2
4 2016-03-05 18:57:34

正弦计算比余弦慢几个数量级

问题描述

tl;dr TL;博士

2 个解决方案

解决方案1 18 已采纳 2016-03-05 18:56:22

解决方案2 4 2016-03-05 18:57:34

解决方案1
18 已采纳 2016-03-05 18:56:22

解决方案2
4 2016-03-05 18:57:34