简体   繁体   English

在numpy的向量化的矩阵曼哈顿距离

[英]Vectorized matrix manhattan distance in numpy

I'm trying to implement an efficient vectorized numpy to make a Manhattan distance matrix. 我正在尝试实现一个高效的矢量化numpy来制作曼哈顿距离矩阵。 I'm familiar with the construct used to create an efficient Euclidean distance matrix using dot products as follows: 我熟悉用于使用点积创建高效欧几里德距离矩阵的构造,如下所示:

A = [[1, 2]   
     [2, 1]]

B = [[1, 1],
     [2, 2],
     [1, 3],
     [1, 4]]

def euclidean_distmtx(X, X):
    f = -2 * np.dot(X, Y.T)
    xsq = np.power(X, 2).sum(axis=1).reshape((-1, 1))
    ysq = np.power(Y, 2).sum(axis=1)
    return np.sqrt(xsq + f + ysq)

I want to implement somthing similar but using Manhattan distance instead. 我想实现类似的东西,但使用曼哈顿距离代替。 So far I've got close but fell short trying to rearrange the absolute differences. 到目前为止,我已经接近但是试图重新安排绝对差异。 As I understand it, the Manhattan distance is 据我了解,曼哈顿的距离是

\\ sum_i | x_i  -  y_i | = | x_1  -  y_1 | + | x_2  -  y_2 | + ...

I tried to solve this by considering if the absolute function didn't apply at all giving me this equivalence 我试图通过考虑绝对函数是否完全不适用于解决这个问题来给我这个等价

\\ sum_i x_i  -  y_i = \\ sum_i x_i  -  \\ sum_i y_i

which gives me the following vectorization 这给了我以下矢量化

def manhattan_distmtx(X, Y):
    f = np.dot(X.sum(axis=1).reshape(-1, 1), Y.sum(axis=1).reshape(-1, 1).T)
    return f / Y.sum(axis=1) - Y.sum(axis=1)

I think I'm the right track but I just can't move the values around without removing that absolute function around the difference between each vector elements. 我认为我是正确的轨道,但我不能移动值而不删除每个向量元素之间的差异的绝对函数。 I'm sure there's a clever trick around the absolute values, possibly by using np.sqrt of a squared value or something but I can't seem to realize it. 我确信在绝对值周围有一个聪明的伎俩,可能是通过使用平方值的np.sqrt或其他东西,但我似乎无法实现它。

I don't think we can leverage BLAS based matrix-multiplication here, as there's no element-wise multiplication involved here. 我不认为我们可以在这里利用基于BLAS的矩阵乘法,因为这里没有涉及元素乘法。 But, we have few alternatives. 但是,我们没有其他选择。

Approach #1 方法#1

We can use Scipy's cdist that features the Manhattan distance with its optional metric argument set as 'cityblock' - 我们可以使用具有曼哈顿距离的Scipy的cdist ,其可选的度量参数设置为'cityblock' -

from scipy.spatial.distance import cdist

out = cdist(A, B, metric='cityblock')

Approach #2 - A 方法#2 - A.

We can also leverage broadcasting , but with more memory requirements - 我们也可以利用broadcasting ,但内存需求更多 -

np.abs(A[:,None] - B).sum(-1)

Approach #2 - B 方法#2 - B.

That could be re-written to use less memory with slicing and summations for input arrays with two cols - 这可以重写为使用更少的内存,对具有两个cols的输入数组进行切片和求和 -

np.abs(A[:,0,None] - B[:,0]) + np.abs(A[:,1,None] - B[:,1])

Approach #2 - C 方法#2 - C.

Porting over the broadcasting version to make use of faster absolute computation with numexpr module - 移植broadcasting版本以利用numexpr模块更快的absolute计算 -

import numexpr as ne
A3D = A[:,None]
out = ne.evaluate('sum(abs(A3D-B),2)')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM