简体   繁体   中英

Python fastest way to find similar vector in a large matrix

I'm trying to write a python function that takes a vector(1x128), then finds the most similar column in a large non-sorted matrix (2000x128). This function called ~100000 times in application. There was no problem when I'm working on desktop PC, but it works very slow in Raspberry Pi. Here is my function;

def find_similar_index(a):
    d = []
    norma=np.linalg.norm(a)
    for i in range(0, 1999):
        d.append(np.abs(np.linalg.norm(a - A[:, i]))/norma)
    return np.argmin(d)

Can I improve anything in this function to work faster? Can I use GPU of Raspberry Pi for this kind of computation?

Here's one approach using broadcasting and np.einsum -

subs = (a[:,None] - A)
sq_dist = np.einsum('ij,ij->j',subs, subs)
min_idx = np.abs(sq_dist).argmin()

Another way to get sq_dist using (ab)^2 = a^2 + b^2 - 2ab formula -

sq_dist = (A**2).sum(0) + a.dot(a) - 2*a.dot(A) 

With np.einsum , that would boosted to -

sq_dist = np.einsum('ij,ij->j',A,A) + a.dot(a) - 2*a.dot(A)

Also, since the final result that we are interested in just the closest index and also those distances from np.linalg.norm would be positive given real numbers from the input arrays, we can skip np.abs and also skip the scaling the scaling down by norma .

Runtime test

Approaches -

def app0(a,A): # Original approach
    d = []
    for i in range(0, A.shape[1]):
        d.append(np.linalg.norm(a - A[:, i]))
    return np.argmin(d)

def app1(a,A):
    subs = (a[:,None] - A)
    sq_dist = np.einsum('ij,ij->j',subs, subs)
    return sq_dist.argmin()

def app2(a,A):
    sq_dist = (A**2).sum(0) + a.dot(a) - 2*a.dot(A) 
    return sq_dist.argmin()

def app3(a,A):
    sq_dist = np.einsum('ij,ij->j',A,A) + a.dot(a) - 2*a.dot(A)
    return sq_dist.argmin()

Since, you mentioned that the vector is of shape (1x128) and you are looking for similar columns in A to that vector, so it seems each column is of length 128 and as such I am assuming that A is shaped (128, 2000) . With those assumptions, here's a setup and timings using the listed approaches -

In [194]: A = np.random.rand(128,2000)
     ...: a = np.random.rand(128)
     ...: 

In [195]: %timeit app0(a,A)
100 loops, best of 3: 9.21 ms per loop

In [196]: %timeit app1(a,A)
1000 loops, best of 3: 330 µs per loop

In [197]: %timeit app2(a,A)
1000 loops, best of 3: 287 µs per loop

In [198]: %timeit app3(a,A)
1000 loops, best of 3: 291 µs per loop

In [200]: 9210/287.0 # Speedup number
Out[200]: 32.09059233449477

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM