两个向量之间的软余弦距离（Python）

Question

I am wondering if there is a good way to calculate the soft cosine distance between two vectors of numbers.我想知道是否有一种好方法来计算两个数字向量之间的软余弦距离。 So far, I have seen solutions for sentences, which however did not help me, unfortunately.到目前为止，我已经看到了句子的解决方案，但是不幸的是，这对我没有帮助。

Say I have two vectors like this:假设我有两个这样的向量：

a = [0,.25,.25,0,.5]
b = [.5,.0,.0,0.25,.25]

Now, I know that the features in the vectors exhibit some degree of similarity among them.现在，我知道向量中的特征在它们之间表现出某种程度的相似性。 This is described via:这通过以下方式描述：

s = [[0,.67,.25,0.78,.53]
     [.53,0,.33,0.25,.25]
     [.45,.33,0,0.25,.25]
     [.85,.04,.11,0,0.25]
     [.95,.33,.44,0.25,0]]

So a and b are 1x5 vectors, and s is a 5x5 matrix, describing how similar the features in a and b are.所以a和b是1x5的向量， s是一个5x5的矩阵，描述了a和b中的特征有多相似。

Now, I would like to calculate the soft cosine distance between a and b , but accounting for between-feature similarity.现在，我想计算a和b之间的软余弦距离，但要考虑特征之间的相似性。 I found this formula, which should calculate what I need: soft cosine formula我找到了这个公式，它应该计算出我需要的东西：软余弦公式

I already tried implementing it using numpy:我已经尝试使用 numpy 来实现它：

import numpy as np

soft_cosine = 1 - (np.dot(a,np.dot(s,b)) / (np.sqrt(np.dot(a,np.dot(s,b))) * np.sqrt(np.dot(a,np.dot(s,b)))))

It is supposed to produce a number between 0 and 1, with a higher number indicating a higher distance between a and b.它应该产生一个介于 0 和 1 之间的数字，数字越大表示 a 和 b 之间的距离越大。 However, I am running this on a larger dataframe with multiple vectors a and b, and for some it produces negative values.但是，我在具有多个向量 a 和 b 的较大 dataframe 上运行它，并且对于某些它会产生负值。 Clearly, I am doing something wrong.显然，我做错了什么。

Any help is greatly appreciated, and I am happy to clarify what need clarification!非常感谢任何帮助，我很高兴澄清需要澄清的内容！

Best, Johannes最好的，约翰内斯

Answer 1

From what I see it may just be a formula error.据我所知，这可能只是一个公式错误。 Could you please try with mine?你能用我的试试吗？

soft_cosine = a @ (s@b) / np.sqrt( (a @ (s@a) ) * (b @ (s@b) ) )

I use the @ operator (which is a shorthand for np.matmul on ndarrays), as I find it cleaner to write: it's just matrix multiplication, no matter if 1D or 2D.我使用@运算符（它是ndarrays上 np.matmul 的简写），因为我发现它更简洁：它只是矩阵乘法，无论是 1D 还是 2D。 It is a simple way to compute a dot product between two 1D arrays, with less code than the usual np.dot function.这是计算两个一维 arrays 之间的点积的简单方法，其代码比通常的np.dot function 少。

Answer 2

soft_cosine = 1 - (np.dot(a,np.dot(s,b)) / (np.sqrt(np.dot(a,np.dot(s,b))) * np.sqrt(np.dot(a,np.dot(s,b)))))

I think you need to change: the denominator has both "a" and both "b".我认为你需要改变：分母既有“a”又有“b”。

soft_cosine = 1 - (np.dot(a,np.dot(s,b)) / (np.sqrt(np.dot(a,np.dot(s,a))) * np.sqrt(np.dot(a,np.dot(s,b))))) . soft_cosine = 1 - (np.dot(a,np.dot(s,b)) / (np.sqrt(np.dot(a,np.dot(s,a))) * np.sqrt(np.dot(a,np.dot(s,b))))) 。

两个向量之间的软余弦距离（Python）

问题描述

2 个解决方案

解决方案1
0 2021-04-21 17:52:50

解决方案2
0 2021-08-31 02:24:58

两个向量之间的软余弦距离（Python）

问题描述

2 个解决方案

解决方案1 0 2021-04-21 17:52:50

解决方案2 0 2021-08-31 02:24:58

解决方案1
0 2021-04-21 17:52:50

解决方案2
0 2021-08-31 02:24:58