向量化矩阵和向量之间的欧几里得距离的计算

Question

I want to calculate the Euclidean distance between matrices and a standard vector. 我想计算矩阵与标准向量之间的欧几里得距离。 All my matrices are stored in a list, let's say, A, so that 我所有的矩阵都存储在一个列表中，比方说，A，这样

A = [[1,2,3],[2,3,4]...,[8,9,10]],

And the standard vector is, let's say, [1,1,1] , 假设标准向量是[1,1,1] ，

I can do this using for-loop, but it's really time-consuming since there usually hundreds of matrices in A. How can I vectorize this calculation to shorten the runtime? 我可以使用for循环进行此操作，但这确实很耗时，因为A中通常有数百个矩阵。如何向量化此计算以缩短运行时间？

Answer 1

A = np.array([[1,2,3],
              [2,3,4],
              [3,4,5],
              [4,5,6],
              [5,6,7],
              [6,7,8],
              [7,8,9],
              [8,9,10]])

v = np.array([1,1,1])

# Compute the length (norm) of the distance between the vectors
distance = np.linalg.norm(A - v, axis = 1)
print(distance)

[ 2.23606798  3.74165739  5.38516481  7.07106781  8.77496439 10.48808848
 12.20655562 13.92838828]

Answer 2

Approach #1 方法1

Use np.einsum for the distance computations. 使用np.einsum进行距离计算。 To solve our case here, we could do - 为了解决这里的问题，我们可以-

def dist_matrix_vec(matrix, vec):    
    d = np.subtract(matrix,vec)
    return np.sqrt(np.einsum('ij,ij->i',d,d))

Sample run - 样品运行-

In [251]: A = [[1,2,3],[2,3,4],[8,9,10]]

In [252]: B = np.array([1,1,1])

In [253]: dist_matrix_vec(A,B)
Out[253]: array([ 2.23606798,  3.74165739, 13.92838828])

Approach #2 方法＃2

When working with large data, we can use numexpr module that supports multi-core processing if the intended operations could be expressed as arithmetic ones. 处理大型数据时，如果可以将预期的操作表示为算术运算，则可以使用支持多核处理的numexpr模块。 To solve our case, we can express it like so - 为了解决我们的问题，我们可以这样表示：

import numexpr as ne

def dist_matrix_vec_numexpr(matrix, vec): 
    matrix = np.asarray(matrix)
    vec = np.asarray(vec)
    return np.sqrt(ne.evaluate('sum((matrix-vec)**2,1)'))

Timings on large arrays - 大型阵列上的时间-

In [295]: np.random.seed(0)
     ...: A = np.random.randint(0,9,(10000,3))
     ...: B = np.random.randint(0,9,(3,))

In [296]: %timeit np.linalg.norm(A - B, axis = 1) #@Nathaniel's soln
     ...: %timeit dist_matrix_vec(A,B)
     ...: %timeit dist_matrix_vec_numexpr(A,B)
1000 loops, best of 3: 244 µs per loop
10000 loops, best of 3: 131 µs per loop
10000 loops, best of 3: 96.5 µs per loop

In [297]: np.random.seed(0)
     ...: A = np.random.randint(0,9,(100000,3))
     ...: B = np.random.randint(0,9,(3,))

In [298]: %timeit np.linalg.norm(A - B, axis = 1) #@Nathaniel's soln
     ...: %timeit dist_matrix_vec(A,B)
     ...: %timeit dist_matrix_vec_numexpr(A,B)
100 loops, best of 3: 5.31 ms per loop
1000 loops, best of 3: 1.43 ms per loop
1000 loops, best of 3: 918 µs per loop

The numexpr based one was with 8 threads. 基于numexpr的有8线程。 Thus, with more number of threads available for compute, it should improve further. 因此，随着更多线程可用于计算，它应该进一步改进。 Related post on how to control multi-core functionality. Related post如何控制多核功能的Related post 。

向量化矩阵和向量之间的欧几里得距离的计算

问题描述

2 个解决方案

解决方案1
1 已采纳 2019-03-23 19:17:06

解决方案2
1 2019-03-24 09:42:27

向量化矩阵和向量之间的欧几里得距离的计算

问题描述

2 个解决方案

解决方案1 1 已采纳 2019-03-23 19:17:06

解决方案2 1 2019-03-24 09:42:27

解决方案1
1 已采纳 2019-03-23 19:17:06

解决方案2
1 2019-03-24 09:42:27