np.nan 和 np.inf 的 Numba 性能问题

Question

I am playing around with numba to accelerate my code.我正在玩numba来加速我的代码。 I notice that the performance varies significantly when using np.inf instead np.nan inside the function.我注意到在 function 中使用np.inf而不是np.nan时，性能差异很大。 Below I have attached three sample functions for illustration.下面我附上了三个示例函数进行说明。

function1 is not accelerated by numba . function1不会被numba加速。
function2 and function3 are both accelerated by numba , but one uses np.nan while the other uses np.inf . function2和function3都由numba加速，但一个使用np.nan而另一个使用np.inf 。

On my machine, the average runtime of the three functions are 0.032284s , 0.041548s and 0.019712s respectively.在我的机器上，三个函数的平均运行时间分别为0.032284s 、 0.041548s和0.019712s 。 It appears that using np.nan is much slower than np.inf .使用np.nan似乎比np.inf慢得多。 Why does the performance vary significantly?为什么性能差异很大？ Thanks in advance.提前致谢。

Edit : I am using Python 3.7.11 and Numba 0.55.Orc1 .编辑：我正在使用Python 3.7.11和Numba 0.55.Orc1 。

import numpy as np
import numba as nb

def function1(array1, array2):
    nr, nc = array1.shape
    output1 = np.empty((nr, nc), dtype='float')
    output2 = np.empty((nr, nc), dtype='float')
    output1[:] = np.nan
    output2[:] = np.nan

    for r in range(nr):
        row1 = array1[r]
        row2 = array2[r]
        diff = row1 - row2
        id_threshold =np.nonzero( (row1 - row2) > 8 )
        output1[r][id_threshold] = 1
        output2[r][id_threshold] = 0

    output1 = output1.flatten()
    output2 = output2.flatten()
    id_keep = np.nonzero(output1 != np.nan)
    output1 = output1[id_keep]
    output2 = output2[id_keep]
    output = np.vstack((output1, output2))
    return output

@nb.njit('float64[:,::1](float64[:,::1], float64[:,::1])', parallel=True)
def function2(array1, array2):
    nr, nc = array1.shape
    output1 = np.empty((nr,nc), dtype='float')
    output2 = np.empty((nr, nc), dtype='float')
    output1[:] = np.nan
    output2[:] = np.nan

    for r in nb.prange(nr):
        row1 = array1[r]
        row2 = array2[r]
        diff = row1 - row2
        id_threshold =np.nonzero( (row1 - row2) > 8 )
        output1[r][id_threshold] = 1
        output2[r][id_threshold] = 0

    output1 = output1.flatten()
    output2 = output2.flatten()
    id_keep = np.nonzero(output1 != np.nan)
    output1 = output1[id_keep]
    output2 = output2[id_keep]
    output = np.vstack((output1, output2))
    return output

@nb.njit('float64[:,::1](float64[:,::1], float64[:,::1])', parallel=True)
def function3(array1, array2):
    nr, nc = array1.shape
    output1 = np.empty((nr,nc), dtype='float')
    output2 = np.empty((nr, nc), dtype='float')
    output1[:] = np.inf
    output2[:] = np.inf

    for r in nb.prange(nr):
        row1 = array1[r]
        row2 = array2[r]
        diff = row1 - row2
        id_threshold =np.nonzero( (row1 - row2) > 8 )
        output1[r][id_threshold] = 1
        output2[r][id_threshold] = 0
    output1 = output1.flatten()
    output2 = output2.flatten()
    id_keep = np.nonzero(output1 != np.inf)
    output1 = output1[id_keep]
    output2 = output2[id_keep]
    output = np.vstack((output1, output2))
    return output


array1 = 10*np.random.random((1000,1000))
array2 = 10*np.random.random((1000,1000))

output1 = function1(array1, array2)
output2 = function2(array1, array2)
output3 = function3(array1, array2)

Answer 1

The second one is much slower because output1.= np.nan returns a copy output1 since np.nan.= np.nan is True (like any other value -- v.= np.nan is always true).第二个要慢得多，因为output1.= np.nan返回一个副本output1 ，因为np.nan.= np.nan为True （与任何其他值一样 - v.= np.nan始终为 true）。 Thus, the resulting array to compute are much bigger causing a slower execution.因此，要计算的结果数组要大得多，从而导致执行速度变慢。

The point is you must never compare a value to np.nan using comparison operators: use np.isnan(value) instead.关键是您绝不能使用比较运算符将值与np.nan进行比较：改用np.isnan(value) 。 In your case, you should use np.logical_not(np.isnan(output1)) .在您的情况下，您应该使用np.logical_not(np.isnan(output1)) 。

The second implementation may be slightly slower due to the temporary array created by np.logical_not (I did not see any statistically significant difference on my machine between using NaN or Inf once the code has been corrected).由于np.logical_not创建的临时数组，第二个实现可能会稍微慢一些（在更正代码后，我没有看到在我的机器上使用 NaN 或 Inf 之间有任何统计上的显着差异）。

np.nan 和 np.inf 的 Numba 性能问题

问题描述

1 个解决方案

解决方案1
3 已采纳 2022-01-17 19:36:08

np.nan 和 np.inf 的 Numba 性能问题

问题描述

1 个解决方案

解决方案1 3 已采纳 2022-01-17 19:36:08

解决方案1
3 已采纳 2022-01-17 19:36:08