简体   繁体   English

是否有更快的方法来添加两个2-d numpy数组

[英]Is there a faster way to add two 2-d numpy array

Let say I have two large 2-d numpy array of same dimensions (say 2000x2000). 假设我有两个相同尺寸的大型2-d numpy数组(比如2000x2000)。 I want to sum them element wise. 我想要明智地总结它们。 I was wondering if there is a faster way than np.add() 我想知道是否有比np.add()更快的方法

Edit: I am adding a similar example of what I am using now. 编辑:我正在添加一个类似于我现在使用的示例。 Is there a way to speed up this? 有没有办法加快这个?

#a and b are the two matrices I already have.Dimension is 2000x2000
#shift is also a list that is previously known
for j in range(100000):
    b=np.roll(b, shift[j] , axis=0)
    a=np.add(a,b)

Approach #1 (Vectorized) 方法#1(矢量化)

We can use modulus to simulate the circulating behavior of roll/circshift and with broadcasted indices to cover all rows, we would have a fully vectorized approach, like so - 我们可以使用modulus来模拟roll/circshift的循环行为,并使用广播指数覆盖所有行,我们将采用完全矢量化的方法,如此 -

n = b.shape[0]
idx = n-1 - np.mod(shift.cumsum()[:,None]-1 - np.arange(n), n)
a += b[idx].sum(0)

Approach #2 (Loopy one) 方法#2(Loopy one)

b_ext = np.row_stack((b, b[:-1] ))
start_idx = n-1 - np.mod(shift.cumsum()-1,n)
for j in range(start_idx.size):
    a += b_ext[start_idx[j]:start_idx[j]+n]

Colon notation vs using indices for slicing 冒号表示法使用索引进行切片

The idea here to do minimal work once we are inside the loop. 一旦我们进入循环,这里的想法就是做最小的工作。 We are pre-computing the start row index of each iteration before going into the loop. 我们在进入循环之前预先计算每次迭代的起始行索引。 So, all we need to do once inside the loop is slicing using colon notation, which is a view into the array and adding up. 因此,我们在循环内部所需要做的就是使用冒号表示切片,这是一个数组视图并加起来。 This should be much better than rolling that needs to compute all of those row indices that results in a copy that is expensive. 这应该比rolling需要计算所有那些导致副本昂贵的行索引要好得多。

Here's a bit more into the view and copy concepts when slicing with colon and indices - 在使用冒号和索引进行切片时,这里有更多关于视图和复制概念的内容 -

In [11]: a = np.random.randint(0,9,(10))

In [12]: a
Out[12]: array([8, 0, 1, 7, 5, 0, 6, 1, 7, 0])

In [13]: a[3:8]
Out[13]: array([7, 5, 0, 6, 1])

In [14]: a[[3,4,5,6,7]]
Out[14]: array([7, 5, 0, 6, 1])

In [15]: np.may_share_memory(a, a[3:8])
Out[15]: True

In [16]: np.may_share_memory(a, a[[3,4,5,6,7]])
Out[16]: False

Runtime test 运行时测试

Function defintions - 功能定义 -

def original_loopy_app(a,b):
    for j in range(shift.size):
        b=np.roll(b, shift[j] , axis=0)
        a += b

def vectorized_app(a,b):
    n = b.shape[0]
    idx = n-1 - np.mod(shift.cumsum()[:,None]-1 - np.arange(n), n)
    a += b[idx].sum(0)

def modified_loopy_app(a,b):
    n = b.shape[0]
    b_ext = np.row_stack((b, b[:-1] ))
    start_idx = n-1 - np.mod(shift.cumsum()-1,n)
    for j in range(start_idx.size):
        a += b_ext[start_idx[j]:start_idx[j]+n]

Case #1: 情况1:

In [5]: # Setup input arrays
   ...: N = 200
   ...: M = 1000
   ...: a = np.random.randint(11,99,(N,N))
   ...: b = np.random.randint(11,99,(N,N))
   ...: shift = np.random.randint(0,N,M)
   ...: 

In [6]: original_loopy_app(a1,b1)
   ...: vectorized_app(a2,b2)
   ...: modified_loopy_app(a3,b3)
   ...: 

In [7]: np.allclose(a1, a2) # Verify results
Out[7]: True

In [8]: np.allclose(a1, a3) # Verify results
Out[8]: True

In [9]: %timeit original_loopy_app(a1,b1)
   ...: %timeit vectorized_app(a2,b2)
   ...: %timeit modified_loopy_app(a3,b3)
   ...: 
10 loops, best of 3: 107 ms per loop
10 loops, best of 3: 137 ms per loop
10 loops, best of 3: 48.2 ms per loop

Case #2: 案例#2:

In [13]: # Setup input arrays (datasets are exactly 1/10th of original sizes)
    ...: N = 200
    ...: M = 10000
    ...: a = np.random.randint(11,99,(N,N))
    ...: b = np.random.randint(11,99,(N,N))
    ...: shift = np.random.randint(0,N,M)
    ...: 

In [14]: %timeit original_loopy_app(a1,b1)
    ...: %timeit modified_loopy_app(a3,b3)
    ...: 
1 loops, best of 3: 1.11 s per loop
1 loops, best of 3: 481 ms per loop

So, we are looking at 2x+ speedup there with the modified loopy approach! 因此,我们正在考虑采用改进的循环方法进行2x+加速!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM