这个python函数可以被矢量化吗？

Question

I have been working on this function that generates some parameters I need for a simulation code I am developing and have hit a wall with enhancing its performance. 我一直在研究这个函数，它生成了我正在开发的模拟代码所需的一些参数，并且已经在增强其性能方面遇到了障碍。

Profiling the code shows that this is the main bottleneck so any enhancements I can make to it however minor would be great. 对代码进行分析表明，这是主要的瓶颈，因此我可以对其进行的任何增强都会很小。

I wanted to try to vectorize parts of this function but I am not sure if it is possible. 我想尝试对这个函数的部分进行矢量化，但我不确定它是否可行。

The main challenge is that the parameters that get stored in my array params depends upon the indices of params. 主要的挑战是存储在我的数组params取决于params的索引。 The only straightforward solution to this I saw was using np.ndenumerate , but this seems to be pretty slow. 我看到的唯一直接的解决方案是使用np.ndenumerate ，但这看起来很慢。

Is it possible to vectorize this type of operation where the values stored in the array depend upon where they are being stored? 是否可以对这种类型的操作进行矢量化，其中存储在数组中的值取决于它们存储的位置？ Or would it be smarter/faster to create a generator that would just give me the tuples of the array indices? 或者创建一个只给我数组索引的元组的生成器会更聪明/更快？

import numpy as np
from scipy.sparse import linalg as LA

def get_params(num_bonds, energies):
    """
    Returns the interaction parameters of different pairs of atoms.

    Parameters
    ----------
    num_bonds : ndarray, shape = (M, 20)
        Sparse array containing the number of nearest neighbor bonds for 
        different pairs of atoms (denoted by their column) and next-
        nearest neighbor bonds. Columns 0-9 contain nearest neighbors, 
        10-19 contain next-nearest neighbors

    energies : ndarray, shape = (M, )
        Energy vector corresponding to each atomic system stored in each 
        row of num_bonds.
    """

    # -- Compute the bond energies
    x = LA.lsqr(num_bonds, energies, show=False)[0]

    params = np.zeros([4, 4, 4, 4, 4, 4, 4, 4, 4])

    nn = {(0,0): x[0], (1,1): x[1], (2,2): x[2], (3,3): x[3], (0,1): x[4],
          (1,0): x[4], (0,2): x[5], (2,0): x[5], (0,3): x[6], (3,0): x[6],
          (1,2): x[7], (2,1): x[7], (1,3): x[8], (3,1): x[8], (2,3): x[9],
          (3,2): x[9]}

    nnn = {(0,0): x[10], (1,1): x[11], (2,2): x[12], (3,3): x[13], (0,1): x[14],
           (1,0): x[14], (0,2): x[15], (2,0): x[15], (0,3): x[16], (3,0): x[16],
           (1,2): x[17], (2,1): x[17], (1,3): x[18], (3,1): x[18], (2,3): x[19],
           (3,2): x[19]}

    """
    params contains the energy contribution of each site due to its
    local environment. The shape is given by the number of possible atom
    types and the number of sites in the lattice.
    """
    for (i,j,k,l,m,jj,kk,ll,mm), val in np.ndenumerate(params):

        params[i,j,k,l,m,jj,kk,ll,mm] = nn[(i,j)] + nn[(i,k)] + nn[(i,l)] + \
                                        nn[(i,m)] + nnn[(i,jj)] + \
                                        nnn[(i,kk)] + nnn[(i,ll)] + nnn[(i,mm)]

return np.ascontiguousarray(params)

Answer 1

Here's a vectorized approach using broadcasted summations - 这是使用broadcasted摘要的矢量化方法 -

# Gather the elements sorted by the keys in (row,col) order of a dense 
# 2D array for both nn and nnn
sidx0 = np.ravel_multi_index(np.array(nn.keys()).T,(4,4)).argsort()
a0 = np.array(nn.values())[sidx0].reshape(4,4)

sidx1 = np.ravel_multi_index(np.array(nnn.keys()).T,(4,4)).argsort()
a1 = np.array(nnn.values())[sidx1].reshape(4,4)

# Perform the summations keep the first axis aligned for nn and nnn parts
parte0 = a0[:,:,None,None,None] + a0[:,None,:,None,None] + \
     a0[:,None,None,:,None] + a0[:,None,None,None,:]

parte1 = a1[:,:,None,None,None] + a1[:,None,:,None,None] + \
     a1[:,None,None,:,None] + a1[:,None,None,None,:]    

# Finally add up sums from nn and nnn for final output    
out = parte0[...,None,None,None,None] + parte1[:,None,None,None,None]

Runtime test 运行时测试

Function defintions - 功能定义 -

def vectorized_approach(nn,nnn):
    sidx0 = np.ravel_multi_index(np.array(nn.keys()).T,(4,4)).argsort()
    a0 = np.array(nn.values())[sidx0].reshape(4,4)    
    sidx1 = np.ravel_multi_index(np.array(nnn.keys()).T,(4,4)).argsort()
    a1 = np.array(nnn.values())[sidx1].reshape(4,4)
    parte0 = a0[:,:,None,None,None] + a0[:,None,:,None,None] + \
         a0[:,None,None,:,None] + a0[:,None,None,None,:]    
    parte1 = a1[:,:,None,None,None] + a1[:,None,:,None,None] + \
         a1[:,None,None,:,None] + a1[:,None,None,None,:]
    return parte0[...,None,None,None,None] + parte1[:,None,None,None,None]

def original_approach(nn,nnn):
    params = np.zeros([4, 4, 4, 4, 4, 4, 4, 4, 4])
    for (i,j,k,l,m,jj,kk,ll,mm), val in np.ndenumerate(params):    
        params[i,j,k,l,m,jj,kk,ll,mm] = nn[(i,j)] + nn[(i,k)] + nn[(i,l)] + \
                                        nn[(i,m)] + nnn[(i,jj)] + \
                                        nnn[(i,kk)] + nnn[(i,ll)] + nnn[(i,mm)]
    return params

Setup inputs - 设置输入 -

# Setup inputs
x = np.random.rand(30)
nn = {(0,0): x[0], (1,1): x[1], (2,2): x[2], (3,3): x[3], (0,1): x[4],
      (1,0): x[4], (0,2): x[5], (2,0): x[5], (0,3): x[6], (3,0): x[6],
      (1,2): x[7], (2,1): x[7], (1,3): x[8], (3,1): x[8], (2,3): x[9],
      (3,2): x[9]}

nnn = {(0,0): x[10], (1,1): x[11], (2,2): x[12], (3,3): x[13], (0,1): x[14],
       (1,0): x[14], (0,2): x[15], (2,0): x[15], (0,3): x[16], (3,0): x[16],
       (1,2): x[17], (2,1): x[17], (1,3): x[18], (3,1): x[18], (2,3): x[19],
       (3,2): x[19]}

Timings - 计时 -

In [98]: np.allclose(original_approach(nn,nnn),vectorized_approach(nn,nnn))
Out[98]: True

In [99]: %timeit original_approach(nn,nnn)
1 loops, best of 3: 884 ms per loop

In [100]: %timeit vectorized_approach(nn,nnn)
1000 loops, best of 3: 708 µs per loop

Welcome to 1000x+ speedup! 欢迎来到1000x+加速！

For a system of generic number of such outer products, here's a generic solution that iterates through those dimensions - 对于具有通用数量的此类外部产品的系统，这里是一个遍历这些维度的通用解决方案 -

m,n = a0.shape # size of output array along each axis
N = 4  # Order of system
out = a0.copy()
for i in range(1,N):
    out = out[...,None] + a0.reshape((m,)+(1,)*i+(n,))

for i in range(N):
    out = out[...,None] + a1.reshape((m,)+(1,)*(i+n)+(n,))

这个python函数可以被矢量化吗？

问题描述

1 个解决方案

解决方案1
3 已采纳 2016-10-12 21:40:55

这个python函数可以被矢量化吗？

问题描述

1 个解决方案

解决方案1 3 已采纳 2016-10-12 21:40:55

解决方案1
3 已采纳 2016-10-12 21:40:55