简体   繁体   English

使用numpy中的数组优化操作

[英]Optimize operations with arrays in numpy

I have to apply some mathematical formula that I've written in python as: 我必须应用一些我用python编写的数学公式:

    for s in range(tdim):
        sum1 = 0.0
        for i in range(dim):
            for j in range(dim):
                sum1+=0.5*np.cos(theta[s]*(i-j))*
                eig1[i]*eig1[j]+eig2[i]+eig2[j])-0.5*np.sin(theta[s]*(i-j))*eig1[j]*eig2[i]-eig1[i]*eig2[j])

        PHi2.append(sum1)

Now, this is correct, but clearly inefficient, the other way around is to do: 现在,这是正确的,但显然效率很低,另一种方法是:

for i in range(dim):
            for j in range(dim):
                PHi2 = 0.5*np.cos(theta*(i-j))*(eig1[i]*eig1[j]+eig2[i]+eig2[j])-0.5*np.sin(theta*(i-j))*(eig1[j]*eig2[i]-eig1[i]*eig2[j])

However, the second example gives me the same number in all elements of PHi2, so this is faster but answer is wrong. 但是,第二个示例在PHi2的所有元素中给了我相同的数字,因此这速度更快,但答案是错误的。 How can you do this correctly and more efficiently? 您如何正确,更有效地执行此操作?

NOTE: eig1 and eig2 are of the same dimension d, theta and PHi2 are the same dimension D, BUT d!=D. 注意:eig1和eig2具有相同的尺寸d,θ和PHi2具有相同的尺寸D,但d!= D。

You can use a brute force broadcasting approach, but you are creating an intermediate array of shape (D, d, d) , which can get out of hand if your arrays are even moderately large. 您可以使用蛮力广播方法,但是您正在创建形状为(D, d, d)的中间阵列,如果阵列的大小适中,则该阵列可能会失控。 Furthermore, in using broadcasting with no refinements you are recomputing a lot of calculations from the innermost loop that you only need to do once. 此外,在使用没有改进的广播时,您需要从最内层循环重新计算很多计算,而您只需要执行一次即可。 If you first compute the necessary parameters for all possible values of i - j and add them together, you can reuse those values on the outer loop, eg: 如果您首先为i - j所有可能值计算必要的参数并将它们相加,则可以在外循环上重用这些值,例如:

def fast_ops(eig1, eig2, theta):
    d = len(eig1)
    d_arr = np.arange(d)
    i_j = d_arr[:, None] - d_arr[None, :]
    reidx = i_j + d - 1
    mult1 = eig1[:, None] * eig1[ None, :] + eig2[:, None] + eig2[None, :]
    mult2 = eig1[None, :] * eig2[:, None] - eig1[:, None] * eig2[None, :]
    mult1_reidx = np.bincount(reidx.ravel(), weights=mult1.ravel())
    mult2_reidx = np.bincount(reidx.ravel(), weights=mult2.ravel())

    angles = theta[:, None] * np.arange(1 - d, d)

    return 0.5 * (np.einsum('ij,j->i', np.cos(angles), mult1_reidx) -
                  np.einsum('ij,j->i', np.sin(angles), mult2_reidx))

IF we rewrite M4rtini's code as a function for comparison: 如果我们将M4rtini的代码重写为比较函数:

def fast_ops1(eig1, eig2, theta):
    d = len(eig1)
    D = len(theta)
    s = np.array(range(D))[:, None, None]
    i = np.array(range(d))[:, None]
    j = np.array(range(d))
    ret = 0.5 * (np.cos(theta[s]*(i-j))*(eig1[i]*eig1[j]+eig2[i]+eig2[j]) -
                 np.sin(theta[s]*(i-j))*(eig1[j]*eig2[i]-eig1[i]*eig2[j]))
    return ret.sum(axis=(-1, -2))

And we make up some data: 我们组成一些数据:

d, D = 100, 200
eig1 = np.random.rand(d)
eig2 = np.random.rand(d)
theta = np.random.rand(D)

The speed improvement is very noticeable, 80x on top of the 115x over your original code, leading to a whooping 9000x speed-up: 速度提高非常明显,在原始代码的115倍的基础上提高了80倍,从而使速度提高了9000倍:

In [22]: np.allclose(fast_ops1(eig1, eig2, theta), fast_ops(eig1, eig2, theta))
Out[22]: True

In [23]: %timeit fast_ops1(eig1, eig2, theta)
10 loops, best of 3: 145 ms per loop

In [24]: %timeit fast_ops(eig1, eig2, theta)
1000 loops, best of 3: 1.85 ms per loop

This works by broadcasting. 这通过广播起作用。
For tdim = 200 and dim = 100 . 对于tdim = 200 and dim = 100
14 seconds with original. 14秒与原始。
120 ms with the version. 版本为120毫秒。

s = np.array(range(tdim))[:, None, None]
i = np.array(range(dim))[:, None]
j = np.array(range(dim))
PHi2 =(0.5*np.cos(theta[s]*(i-j))*(eig1[i]*eig1[j]+eig2[i]+eig2[j])-0.5*np.sin(theta[s]*(i-j))*(eig1[j]*eig2[i]-eig1[i]*eig2[j])).sum(axis=2).sum(axis=1)

In the first bit of code, you have 0.5*np.cos(theta[s]*(ij))... but in the second it's 0.5*np.cos(theta*(ij))... . 在代码的第一位,您有0.5*np.cos(theta[s]*(ij))...但在第二位是0.5*np.cos(theta*(ij))... Unless you've got theta defined differently for the second bit of code, this could well be the cause of the trouble. 除非您为第二部分代码定义了不同的theta,否则很可能是造成麻烦的原因。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM