简体   繁体   中英

Cumulative dot product with numpy

I have an ndarray A, populated with N squared DxD matrices (shape (N,D,D)). I want to transform it into an ndarray B of the same shape, where B[0]=A[0] and for every i>0, B[i] = np.dot(B[i-1], A[i]). While a basic implementation is obvious, I wondered whether this operation has a faster implementation than a for loop.

Let me, For example, describe another way to perform the calculation:

  1. B[0...N/2] = compute for A[0]...A[N/2 - 1] the basic way
  2. B[N/2...N] = compute for A[N/2]...A[N] the basic way
  3. return np.concatenate((B[0...N/2 - 1], np.dot(B[N/2 - 1], B[N/2...N])]

The emphasis is that 1 and 2 can be done in parallel and 3 is a vectorized operation - and that this split can be further applied for each half on the array as needed. This makes me wonder if a better option than the basic for loop exists (eg whether what I'm suggesting is implemented/is an actual improvement, or whether another option is preferrable).

Many thanks,

Yiftach

Edit: code for most basic implementation, for benchmarking:

import numpy as np

def cumdot(A):
    B = np.empty(A.shape)
    B[0] = A[0]
    for i in range(1, A.shape[0]):  
        B[i] = B[i - 1] @ A[i]
    return B

Edit2: It seems like in numpy, all ufuncs support a .accumulate() (which is exactly what I'm trying to do), and matmul (which behaves like a dot product), is a generalized ufunc. That means matmul is not a function from two scalars to one, but from two matrices to a matrix, and therefore while the function accumulate exist, calling it will raise an exception stating that accumulate is not callable on ufuncs that have a signature. If this can be made to work despite the signature thing, I'd also love to know.

I don't think there is a fully vectorized way to do this with just numpy functions (but I'd be happy to be proven wrong!).

You can hide the loop by using itertools.accumulate to generate the cumulative products. Here's an example. To create A , I'll use N random orthogonal matrices, generated using scipy.stats.ortho_group , to ensure that the products remain bounded.

The first few lines here create A , with shape (1000, 4, 4).

In [101]: from scipy.stats import ortho_group

In [102]: N = 1000

In [103]: D = 4

In [104]: A = ortho_group.rvs(D, size=N)

Compute the cumulative products with itertools.accumulate , and put the result in a numpy array.

In [105]: from itertools import accumulate

In [106]: B = np.array(list(accumulate(A, np.matmul)))   

Verify that we get the same result from cumdot(A) .

In [107]: def cumdot(A): 
     ...:     B = np.empty(A.shape) 
     ...:     B[0] = A[0] 
     ...:     for i in range(1, A.shape[0]):   
     ...:         B[i] = B[i - 1] @ A[i] 
     ...:     return B 
     ...:                                                                                         

In [108]: B0 = cumdot(A)                                                                          

In [109]: (B == B0).all()                                                                         
Out[109]: True

Check the performance. It turns out the using itertools.accumulate is slightly faster.

In [110]: %timeit B0 = cumdot(A)
2.89 ms ± 31.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [111]: %timeit B = np.array(list(accumulate(A, np.matmul)))
2.44 ms ± 33.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM