简体   繁体   English

使用两个numpy向量中的元素对的函数填充矩阵的最快方法?

[英]Fastest way to populate a matrix with a function on pairs of elements in two numpy vectors?

I have two 1 dimensional numpy vectors va and vb which are being used to populate a matrix by passing all pair combinations to a function. 我有两个1维numpy向量vavb ,它们用于通过将所有对组合传递给函数来填充矩阵。

na = len(va)
nb = len(vb)
D = np.zeros((na, nb))
for i in range(na):
    for j in range(nb):
        D[i, j] = foo(va[i], vb[j])

As it stands, this piece of code takes a very long time to run due to the fact that va and vb are relatively large (4626 and 737). 目前,由于va和vb相对较大(4626和737),这段代码需要很长时间才能运行。 However I am hoping this can be improved due to the fact that a similiar procedure is performed using the cdist method from scipy with very good performance. 但是我希望这可以改进,因为使用cdist方法执行类似的程序并且具有非常好的性能。

D = cdist(va, vb, metric)

I am obviously aware that scipy has the benefit of running this piece of code in C rather than in python - but I'm hoping there is some numpy function im unaware of that can execute this quickly. 我显然知道scipy有利于在C中运行这段代码而不是在python中 - 但是我希望有一些不知道的numpy函数可以快速执行。

cdist is fast because it is written in highly-optimized C code (as you already pointed out), and it only supports a small predefined set of metric s. cdist很快,因为它是用高度优化的C代码编写的(正如您已经指出的那样), 并且它只支持一组小的预定义metric

Since you want to apply the operation generically, to any given foo function, you have no choice but to call that function na -times- nb times. 既然要申请一般的操作,任何给定的foo功能,你没有选择,只能调用该函数na -times- nb倍。 That part is not likely to be further optimizable. 那部分不太可能进一步优化。

What's left to optimize are the loops and the indexing. 剩下要优化的是循环和索引。 Some suggestions to try out: 尝试一些建议:

  1. Use xrange instead of range (if in python2.x. in python3, range is already a generator-like) 使用xrange而不是range (如果在python2.x中,在python3中,范围已经是类似于生成器)
  2. Use enumerate , instead of range + explicitly indexing 使用enumerate ,而不是范围+显式索引
  3. Use a python speed "magic", such as cython or numba , to speed up the looping process. 使用python速度“魔法”,例如cythonnumba ,来加速循环过程。

If you can make further assumptions about foo , it might be possible to speed it up further. 如果你可以对foo做进一步的假设,那么就有可能进一步加快它。

Like @shx2 said, it all depends on what is foo . 就像@ shx2所说,这一切都取决于什么是foo If you can express it in terms of numpy ufuncs, then use outer method: 如果你可以用numpy ufuncs来表达它,那么使用outer方法:

In [11]: N = 400

In [12]: B = np.empty((N, N))

In [13]: x = np.random.random(N)

In [14]: y = np.random.random(N)

In [15]: %%timeit
for i in range(N):
   for j in range(N):
     B[i, j] = x[i] - y[j]
   ....: 
10 loops, best of 3: 87.2 ms per loop

In [16]: %timeit A = np.subtract.outer(x, y)   # <--- np.subtract is a ufunc
1000 loops, best of 3: 294 µs per loop

Otherwise you can push the looping down to cython level. 否则你可以将循环推向cython级别。 Continuing a trivial example above: 上面继续一个简单的例子:

In [45]: %%cython
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
def foo(double[::1] x, double[::1] y, double[:, ::1] out):
    cdef int i, j
    for i in xrange(x.shape[0]):
        for j in xrange(y.shape[0]):
            out[i, j] = x[i] - y[j]
   ....: 

In [46]: foo(x, y, B)

In [47]: np.allclose(B, np.subtract.outer(x, y))
Out[47]: True

In [48]: %timeit foo(x, y, B)
10000 loops, best of 3: 149 µs per loop

The cython example is deliberately made overly simplistic: in reality you might want to add some shape/stride checks, allocate the memory within your function etc. 故意将cython示例过于简单化:实际上,您可能需要添加一些形状/步幅检查,在函数内分配内存等。

One of the least known numpy functions for what the docs call functional programming routines is np.frompyfunc . 对于文档称为函数式编程例程的最不为人知的numpy函数之一是np.frompyfunc This creates a numpy ufunc from a Python function. 这会从Python函数创建一个numpy ufunc。 Not some other object that closely simulates a numpy ufunc, but a proper ufunc with all its bells and whistles. 不是一些其他对象可以模拟一个numpy ufunc,而是一个带有所有铃声和口哨的正确ufunc。 While the behavior is in many aspects very similar to np.vectorize , it has some distinct advantages, that hopefully the following code should highlight: 虽然行为在很多方面与np.vectorize非常相似,但它有一些明显的优点,希望以下代码可以强调:

In [2]: def f(a, b):
   ...:     return a + b
   ...:

In [3]: f_vec = np.vectorize(f)

In [4]: f_ufunc = np.frompyfunc(f, 2, 1)  # 2 inputs, 1 output

In [5]: a = np.random.rand(1000)

In [6]: b = np.random.rand(2000)

In [7]: %timeit np.add.outer(a, b)  # a baseline for comparison
100 loops, best of 3: 9.89 ms per loop

In [8]: %timeit f_vec(a[:, None], b)  # 50x slower than np.add
1 loops, best of 3: 488 ms per loop

In [9]: %timeit f_ufunc(a[:, None], b)  # ~20% faster than np.vectorize...
1 loops, best of 3: 425 ms per loop

In [10]: %timeit f_ufunc.outer(a, b)  # ...and you get to use ufunc methods
1 loops, best of 3: 427 ms per loop

So while it is still clearly inferior to a properly vectorized implementation, it is a little faster (the looping is in C, but you still have the Python function call overhead). 因此,虽然它仍然明显不如正确的矢量化实现,但它更快一些(循环在C中,但你仍然有Python函数调用开销)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 用向量作为numpy中的条目迭代矩阵的最快方法 - Fastest way to Iterate a Matrix with vectors as entries in numpy 从两个向量的差异中填充numpy矩阵 - Populate numpy matrix from the difference of two vectors numpy:将两个向量(或一个与它本身)的笛卡尔积相乘来创建矩阵,同时将函数应用于所有对 - Numpy: create a matrix from a cartesian product of two vectors (or one with itself) while applying a function to all pairs 使用numpy从两个对象向量生成对的矩阵 - generating matrix of pairs from two object vectors using numpy 从 Numpy 中的 N 个向量中找到所有唯一的(几乎)平行 3d 向量对的最快方法 - Fastest way to find all unique pairs of (nearly) parallel 3d vectors from N vectors in Numpy Numpy:改变所有矩阵元素的 10% 的最快方法 - Numpy: Fastest way to change 10% of all matrix elements numpy 中获得 n 对距离的最快方法 - Fastest way in numpy to get distance of n pairs 填充1D numpy数组的最快方法 - fastest way to populate a 1D numpy array 在没有重复的情况下找到两个 numpy arrays 之间最近对的最快方法 - Fastest way to find the nearest pairs between two numpy arrays without duplicates 通过将 function 应用于 nx 1 numpy 数组中的元素对,numpy 中的 nxn 矩阵 - n x n matrix in numpy by applying function to pairs of elements in n x 1 numpy array
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM