简体   繁体   中英

Dot product vectorization - NumPy

I have a function which I would like to vectorize away the remaining loop. I believe it is correct and I am happy with performance, but would just like to know more about vectorizing code. The function is:

def f(x, A, c):
    # A : d*d numpy array
    # c : length d numpy array
    # x : N x d or length d numpy array
    x = np.atleast_2d(x)
    b = np.zeros(x.shape[0], dtype=np.bool)
    for row in range(x.shape[0]):
        xmc = x[row, :] - c
        b[row] = xmc.dot(A).dot(xmc) <= 1
    return b

Is it possible to vectorize the function and remove the remaining loop while keeping it reasonably simple? Are there any guidelines for when independent calculations in a loop cannot be vectorized well? Typical values for N and d are 10000 and 4 respectively. Thank you.

You could vectorize like so -

xc = x-c
b_out = ((xc.dot(A))*xc).sum(1) <= 1

You can also use np.einsum -

xc = x-c
b_out = np.einsum('ij,jk,ik->i',xc,A,xc) <= 1

Runtime tests -

Define functions:

def org_app(x, A, c):
    x = np.atleast_2d(x)
    b = np.zeros(x.shape[0], dtype=np.bool)
    for row in range(x.shape[0]):
        xmc = x[row, :] - c
        b[row] = xmc.dot(A).dot(xmc) <= 1
    return b

def vectorized_app1(x,A,c):    
    xc = x-c
    return ((xc.dot(A))*xc).sum(1) <= 1

def vectorized_app2(x,A,c):    
    xc = x-c
    return np.einsum('ij,jk,ik->i',xc,A,xc) <= 1

Timings:

In [266]: N = 20
     ...: d = 20
     ...: A = np.random.rand(d,d)
     ...: c = np.random.rand(d)
     ...: x = np.random.rand(N,d)
     ...: 

In [267]: %timeit org_app(x,A,c)
1000 loops, best of 3: 274 µs per loop

In [268]: %timeit vectorized_app1(x,A,c)
10000 loops, best of 3: 46 µs per loop

In [269]: %timeit vectorized_app2(x,A,c)
10000 loops, best of 3: 63.7 µs per loop

In [270]: N = 100
     ...: d = 100
     ...: A = np.random.rand(d,d)
     ...: c = np.random.rand(d)
     ...: x = np.random.rand(N,d)
     ...: 

In [271]: %timeit org_app(x,A,c)
100 loops, best of 3: 2.74 ms per loop

In [272]: %timeit vectorized_app1(x,A,c)
1000 loops, best of 3: 1.46 ms per loop

In [273]: %timeit vectorized_app2(x,A,c)
100 loops, best of 3: 4.72 ms per loop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM