[英]Vectorized matrix manhattan distance in numpy
I'm trying to implement an efficient vectorized numpy
to make a Manhattan distance matrix. 我正在尝试实现一个高效的矢量化
numpy
来制作曼哈顿距离矩阵。 I'm familiar with the construct used to create an efficient Euclidean distance matrix using dot products as follows: 我熟悉用于使用点积创建高效欧几里德距离矩阵的构造,如下所示:
A = [[1, 2]
[2, 1]]
B = [[1, 1],
[2, 2],
[1, 3],
[1, 4]]
def euclidean_distmtx(X, X):
f = -2 * np.dot(X, Y.T)
xsq = np.power(X, 2).sum(axis=1).reshape((-1, 1))
ysq = np.power(Y, 2).sum(axis=1)
return np.sqrt(xsq + f + ysq)
I want to implement somthing similar but using Manhattan distance instead. 我想实现类似的东西,但使用曼哈顿距离代替。 So far I've got close but fell short trying to rearrange the absolute differences.
到目前为止,我已经接近但是试图重新安排绝对差异。 As I understand it, the Manhattan distance is
据我了解,曼哈顿的距离是
I tried to solve this by considering if the absolute function didn't apply at all giving me this equivalence 我试图通过考虑绝对函数是否完全不适用于解决这个问题来给我这个等价
which gives me the following vectorization 这给了我以下矢量化
def manhattan_distmtx(X, Y):
f = np.dot(X.sum(axis=1).reshape(-1, 1), Y.sum(axis=1).reshape(-1, 1).T)
return f / Y.sum(axis=1) - Y.sum(axis=1)
I think I'm the right track but I just can't move the values around without removing that absolute function around the difference between each vector elements. 我认为我是正确的轨道,但我不能移动值而不删除每个向量元素之间的差异的绝对函数。 I'm sure there's a clever trick around the absolute values, possibly by using
np.sqrt
of a squared value or something but I can't seem to realize it. 我确信在绝对值周围有一个聪明的伎俩,可能是通过使用平方值的
np.sqrt
或其他东西,但我似乎无法实现它。
I don't think we can leverage BLAS based matrix-multiplication here, as there's no element-wise multiplication involved here. 我不认为我们可以在这里利用基于BLAS的矩阵乘法,因为这里没有涉及元素乘法。 But, we have few alternatives.
但是,我们没有其他选择。
Approach #1 方法#1
We can use Scipy's cdist
that features the Manhattan distance with its optional metric argument set as 'cityblock'
- 我们可以使用具有曼哈顿距离的Scipy的
cdist
,其可选的度量参数设置为'cityblock'
-
from scipy.spatial.distance import cdist
out = cdist(A, B, metric='cityblock')
Approach #2 - A 方法#2 - A.
We can also leverage broadcasting
, but with more memory requirements - 我们也可以利用
broadcasting
,但内存需求更多 -
np.abs(A[:,None] - B).sum(-1)
Approach #2 - B 方法#2 - B.
That could be re-written to use less memory with slicing and summations for input arrays with two cols - 这可以重写为使用更少的内存,对具有两个cols的输入数组进行切片和求和 -
np.abs(A[:,0,None] - B[:,0]) + np.abs(A[:,1,None] - B[:,1])
Approach #2 - C 方法#2 - C.
Porting over the broadcasting
version to make use of faster absolute
computation with numexpr
module - 移植
broadcasting
版本以利用numexpr
模块更快的absolute
计算 -
import numexpr as ne
A3D = A[:,None]
out = ne.evaluate('sum(abs(A3D-B),2)')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.