[英]Fastest way to populate a matrix with a function on pairs of elements in two numpy vectors?
I have two 1 dimensional numpy vectors va
and vb
which are being used to populate a matrix by passing all pair combinations to a function. 我有两个1维numpy向量
va
和vb
,它们用于通过将所有对组合传递给函数来填充矩阵。
na = len(va)
nb = len(vb)
D = np.zeros((na, nb))
for i in range(na):
for j in range(nb):
D[i, j] = foo(va[i], vb[j])
As it stands, this piece of code takes a very long time to run due to the fact that va and vb are relatively large (4626 and 737). 目前,由于va和vb相对较大(4626和737),这段代码需要很长时间才能运行。 However I am hoping this can be improved due to the fact that a similiar procedure is performed using the
cdist
method from scipy with very good performance. 但是我希望这可以改进,因为使用
cdist
方法执行类似的程序并且具有非常好的性能。
D = cdist(va, vb, metric)
I am obviously aware that scipy has the benefit of running this piece of code in C rather than in python - but I'm hoping there is some numpy function im unaware of that can execute this quickly. 我显然知道scipy有利于在C中运行这段代码而不是在python中 - 但是我希望有一些不知道的numpy函数可以快速执行。
cdist
is fast because it is written in highly-optimized C code (as you already pointed out), and it only supports a small predefined set of metric
s. cdist
很快,因为它是用高度优化的C代码编写的(正如您已经指出的那样), 并且它只支持一组小的预定义metric
。
Since you want to apply the operation generically, to any given foo
function, you have no choice but to call that function na
-times- nb
times. 既然要申请一般的操作,任何给定的
foo
功能,你没有选择,只能调用该函数na
-times- nb
倍。 That part is not likely to be further optimizable. 那部分不太可能进一步优化。
What's left to optimize are the loops and the indexing. 剩下要优化的是循环和索引。 Some suggestions to try out:
尝试一些建议:
xrange
instead of range
(if in python2.x. in python3, range is already a generator-like) xrange
而不是range
(如果在python2.x中,在python3中,范围已经是类似于生成器) enumerate
, instead of range + explicitly indexing enumerate
,而不是范围+显式索引 cython
or numba
, to speed up the looping process. cython
或numba
,来加速循环过程。 If you can make further assumptions about foo
, it might be possible to speed it up further. 如果你可以对
foo
做进一步的假设,那么就有可能进一步加快它。
Like @shx2 said, it all depends on what is foo
. 就像@ shx2所说,这一切都取决于什么是
foo
。 If you can express it in terms of numpy ufuncs, then use outer
method: 如果你可以用numpy ufuncs来表达它,那么使用
outer
方法:
In [11]: N = 400
In [12]: B = np.empty((N, N))
In [13]: x = np.random.random(N)
In [14]: y = np.random.random(N)
In [15]: %%timeit
for i in range(N):
for j in range(N):
B[i, j] = x[i] - y[j]
....:
10 loops, best of 3: 87.2 ms per loop
In [16]: %timeit A = np.subtract.outer(x, y) # <--- np.subtract is a ufunc
1000 loops, best of 3: 294 µs per loop
Otherwise you can push the looping down to cython level. 否则你可以将循环推向cython级别。 Continuing a trivial example above:
上面继续一个简单的例子:
In [45]: %%cython
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
def foo(double[::1] x, double[::1] y, double[:, ::1] out):
cdef int i, j
for i in xrange(x.shape[0]):
for j in xrange(y.shape[0]):
out[i, j] = x[i] - y[j]
....:
In [46]: foo(x, y, B)
In [47]: np.allclose(B, np.subtract.outer(x, y))
Out[47]: True
In [48]: %timeit foo(x, y, B)
10000 loops, best of 3: 149 µs per loop
The cython example is deliberately made overly simplistic: in reality you might want to add some shape/stride checks, allocate the memory within your function etc. 故意将cython示例过于简单化:实际上,您可能需要添加一些形状/步幅检查,在函数内分配内存等。
One of the least known numpy functions for what the docs call functional programming routines is np.frompyfunc
. 对于文档称为函数式编程例程的最不为人知的numpy函数之一是
np.frompyfunc
。 This creates a numpy ufunc from a Python function. 这会从Python函数创建一个numpy ufunc。 Not some other object that closely simulates a numpy ufunc, but a proper ufunc with all its bells and whistles.
不是一些其他对象可以模拟一个numpy ufunc,而是一个带有所有铃声和口哨的正确ufunc。 While the behavior is in many aspects very similar to
np.vectorize
, it has some distinct advantages, that hopefully the following code should highlight: 虽然行为在很多方面与
np.vectorize
非常相似,但它有一些明显的优点,希望以下代码可以强调:
In [2]: def f(a, b):
...: return a + b
...:
In [3]: f_vec = np.vectorize(f)
In [4]: f_ufunc = np.frompyfunc(f, 2, 1) # 2 inputs, 1 output
In [5]: a = np.random.rand(1000)
In [6]: b = np.random.rand(2000)
In [7]: %timeit np.add.outer(a, b) # a baseline for comparison
100 loops, best of 3: 9.89 ms per loop
In [8]: %timeit f_vec(a[:, None], b) # 50x slower than np.add
1 loops, best of 3: 488 ms per loop
In [9]: %timeit f_ufunc(a[:, None], b) # ~20% faster than np.vectorize...
1 loops, best of 3: 425 ms per loop
In [10]: %timeit f_ufunc.outer(a, b) # ...and you get to use ufunc methods
1 loops, best of 3: 427 ms per loop
So while it is still clearly inferior to a properly vectorized implementation, it is a little faster (the looping is in C, but you still have the Python function call overhead). 因此,虽然它仍然明显不如正确的矢量化实现,但它更快一些(循环在C中,但你仍然有Python函数调用开销)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.