大型numpy数组中的中位数更快

Question

I have a very large numpy array with the dimension of (4000, 6000, 15). 我有一个非常大的numpy数组，其维度为（4000，6000，15）。

I now want the median for each stack, ie along the third dimension. 我现在想要每个堆栈的中值，即沿第三维。 Current code works, but is curiously slow, the median for a single stack [0,0,:] (15 values) takes at least half a second or so to complete. 当前代码可以工作，但是奇怪的是速度很慢，单个堆栈的中位数[0,0，：]（15个值）至少需要半秒左右的时间才能完成。

height = 4000
width = 6000
N = 15

poolmedian = np.zeros((height,width,3))
RGBmedian = np.zeros((height,width,N), dtype=float)    

for n in range(0,height):
    for m in range(0,width):
                poolmedian[n,m,0] = np.median(RGBmedian[n,m,:])

Answer 1

You'll want to vectorize the median computation as much as possible. 您将要尽可能向量化中值计算。 Every time you call a numpy function, you take a hit going back and forth between the C and Python layer. 每次调用numpy函数时，都会在C和Python层之间来回移动。 Do as much in the C layer as possible: 在C层中尽可能多地执行以下操作：

import numpy as np
height = 40
width = 60
N = 15

np.random.seed(1)
poolmedian = np.zeros((height,width,3))
RGBmedian = np.random.random((height,width,N))

def original():
    for n in range(0,height):
        for m in range(0,width):
            poolmedian[n,m,0] = np.median(RGBmedian[n,m,:])
    return poolmedian

def vectorized():
    # Note: np.median is only called ONCE, not n*m times.
    poolmedian[:, :, 0] = np.median(RGBmedian, axis=-1)
    return poolmedian


orig = original()
vec = vectorized()

np.testing.assert_array_equal(orig, vec)

You can see that the values are the same since the assert passes (although it's not clear why you need 3 dims in poolmedian ). 您可以看到，自断言通过以来，值是相同的（尽管不清楚为什么在poolmedian需要3个dims）。 I put the above code in a file called test.py and am using IPython for it's convenient %timeit . 我将上面的代码放在一个名为test.py的文件中，并使用IPython来方便使用%timeit 。 I also toned down the size a bit just so it runs faster, but you should get similar savings on your large data. 我也略微减小了大小，以使其运行更快，但是您应该在大数据上获得类似的节省。 The vectorized version is about 100x faster: 向量化版本的速度提高了约100倍：

In [1]: from test import original, vectorized

In [2]: %timeit original()
69.1 ms ± 394 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [3]: %timeit vectorized()
618 µs ± 4.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In general, you want to use numpy s broadcasting rules and call a function as few times as possible. 通常，您要使用numpy的广播规则并尽可能少地调用一个函数。 Calling functions in a loop is almost always a no-no if you're looking for performant numpy code. 如果您正在寻找高性能的numpy代码，则在循环中调用函数几乎总是 numpy 。

Addendum: 附录：

I've added the following function to test.py, since there is another answer, I want to make it clear that it's faster to call a fully vectorized version (ie no loops), and also modified to code to use dims 4000 by 6000: 我已经在test.py中添加了以下函数，因为还有另一个答案，所以我想说明一下，调用完全矢量化的版本（即无循环）会更快，并且还可以修改为使用4000到6000的dims ：

import numpy as np
height = 4000
width = 6000
N = 15

...

def fordy():
    for n in range(0,height):
        for m in range(0,width):
            array = RGBmedian[n,m,:]
            array.sort()
            poolmedian[n, m, 0] = (array[6] + array[7])/2
    return poolmedian

and if we load all of this into IPython, we get: 如果将所有这些都加载到IPython中，则会得到：

In [1]: from test import original, fordy, vectorized

In [2]: %timeit original()
6.87 s ± 72.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [3]: %timeit fordy()
262 ms ± 737 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [4]: %timeit vectorized()
18.4 ms ± 149 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

HTH. HTH。

大型numpy数组中的中位数更快

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-10-25 14:06:14

大型numpy数组中的中位数更快

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-10-25 14:06:14

解决方案1
2 已采纳 2018-10-25 14:06:14