[英]Compute the pairwise distance between each pair of the two collections of inputs in TensorFlow
I have two collections. 我有两个收藏。 One consists of m 1 points in k dimensions and another one of m 2 points in k dimensions.
一个由米在K个维度1分和K个维度的米2分另一个。 I need to calculate pairwise distance between each pair of the two collections.
我需要计算两个集合的每对之间的成对距离。
Basically having two matrices A m 1 , k and B m 2 , k I need to get a matrix C m 1 , m 2 . 基本上有两个矩阵A m 1 ,k和B m 2 ,k I需要得到一个矩阵C m 1 ,m 2 。
I can easily do this in scipy by using distance.sdist and select one of many distance metrics, and I also can do this in TF in a loop, but I can't figure out how to do this with matrix manipulations even for Eucledian distance. 我可以通过使用distance.sdist轻松地在scipy中执行此操作,并选择许多距离度量之一,而且我也可以在TF中循环执行此操作,但是即使对于Eucledian距离,我也无法弄清楚如何使用矩阵操作来执行此操作。
After a few hours I finally found how to do this in Tensorflow. 几个小时后,我终于在Tensorflow中找到了如何进行此操作。 My solution works only for Eucledian distance and is pretty verbose.
我的解决方案仅适用于Eucledian距离,并且非常冗长。 I also do not have a mathematical proof (just a lot of handwaving, which I hope to make more rigorous):
我也没有数学上的证明(只是做了大量的手工操作,我希望使其更加严格):
import tensorflow as tf
import numpy as np
from scipy.spatial.distance import cdist
M1, M2, K = 3, 4, 2
# Scipy calculation
a = np.random.rand(M1, K).astype(np.float32)
b = np.random.rand(M2, K).astype(np.float32)
print cdist(a, b, 'euclidean'), '\n'
# TF calculation
A = tf.Variable(a)
B = tf.Variable(b)
p1 = tf.matmul(
tf.expand_dims(tf.reduce_sum(tf.square(A), 1), 1),
tf.ones(shape=(1, M2))
)
p2 = tf.transpose(tf.matmul(
tf.reshape(tf.reduce_sum(tf.square(B), 1), shape=[-1, 1]),
tf.ones(shape=(M1, 1)),
transpose_b=True
))
res = tf.sqrt(tf.add(p1, p2) - 2 * tf.matmul(A, B, transpose_b=True))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print sess.run(res)
This will do it for tensors of arbitrary dimensionality (ie containing (..., N, d) vectors). 这将对任意维度的张量(即包含(...,N,d)向量)进行处理。 Note that it isn't between collections (ie not like
scipy.spatial.distance.cdist
) it's instead within a single batch of vectors (ie like scipy.spatial.distance.pdist
) 请注意,它不在集合之间(即不像
scipy.spatial.distance.cdist
),而是在单批向量内(例如scipy.spatial.distance.pdist
)
import tensorflow as tf
import string
def pdist(arr):
"""Pairwise Euclidean distances between vectors contained at the back of tensors.
Uses expansion: (x - y)^T (x - y) = x^Tx - 2x^Ty + y^Ty
:param arr: (..., N, d) tensor
:returns: (..., N, N) tensor of pairwise distances between vectors in the second-to-last dim.
:rtype: tf.Tensor
"""
shape = tuple(arr.get_shape().as_list())
rank_ = len(shape)
N, d = shape[-2:]
# Build a prefix from the array without the indices we'll use later.
pref = string.ascii_lowercase[:rank_ - 2]
# Outer product of points (..., N, N)
xxT = tf.einsum('{0}ni,{0}mi->{0}nm'.format(pref), arr, arr)
# Inner product of points. (..., N)
xTx = tf.einsum('{0}ni,{0}ni->{0}n'.format(pref), arr, arr)
# (..., N, N) inner products tiled.
xTx_tile = tf.tile(xTx[..., None], (1,) * (rank_ - 1) + (N,))
# Build the permuter. (sigh, no tf.swapaxes yet)
permute = list(range(rank_))
permute[-2], permute[-1] = permute[-1], permute[-2]
# dists = (x^Tx - 2x^Ty + y^Tx)^(1/2). Note the axis swapping is necessary to 'pair' x^Tx and y^Ty
return tf.sqrt(xTx_tile - 2 * xxT + tf.transpose(xTx_tile, permute))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.