简体   繁体   English

如何计算两个向量的标准化欧几里得距离?

[英]How to calculate normalized euclidean distance on two vectors?

Let's say I have the following two vectors: 假设我有以下两个向量:

x = [(10-1).*rand(7,1) + 1; randi(10,1,1)];
y = [(10-1).*rand(7,1) + 1; randi(10,1,1)];

The first seven elements are continuous values in the range [1,10]. 前七个元素是[1,10]范围内的连续值。 The last element is an integer in the range [1,10]. 最后一个元素是[1,10]范围内的整数。

Now I would like to compute the euclidean distance between x and y. 现在,我想计算x和y之间的欧式距离。 I think the integer element is a problem because all other elements can get very close but the integer element has always spacings of ones. 我认为整数元素是一个问题,因为所有其他元素都可以变得非常接近,但是整数元素总是间隔为1。 So there is a bias towards the integer element. 因此,偏向于整数元素。

How can I calculate something like a normalized euclidean distance on it? 如何计算类似标准化欧几里德距离的值?

According to Wolfram Alpha , and the following answer from cross validated , the normalized Eucledean distance is defined by: 根据Wolfram Alpha通过以下交叉验证的答案 ,归一化的Eucledean距离定义为:

在此处输入图片说明

You can calculate it with MATLAB by using: 您可以使用以下方法在MATLAB中进行计算:

0.5*(std(x-y)^2) / (std(x)^2+std(y)^2)

Alternatively, you can use: 或者,您可以使用:

0.5*((norm((x-mean(x))-(y-mean(y)))^2)/(norm(x-mean(x))^2+norm(y-mean(y))^2))

I would rather normalise x and y before calculating the distance and then vanilla Euclidean would suffice. 在计算距离之前,我宁愿将x和y归一化,然后香草欧几里得就足够了。

In your example 在你的例子中

x_norm = (x -1) / 9;          % normalised x
y_norm = (y -1) / 9;          % normalised y
dist = norm(x_norm - y_norm); % Euclidean distance between normalised x, y

However, I am not sure about whether having an integer element contributes to some sort of bias but we have already gotten kind of off-topic for stack overflow :) 但是,我不确定是否有整数元素会导致某种偏差,但是对于堆栈溢出,我们已经有些偏离主题了:)

From Euclidean Distance - raw, normalized and double‐scaled coefficients 来自欧几里得距离-原始,归一化和双比例系数

SYSTAT , Primer 5 , and SPSS provide Normalization options for the data so as to permit an investigator to compute a distance coefficient which is essentially “scale free”. SYSTATPrimer 5SPSS为数据提供了归一化选项,以允许研究者计算距离系数,该距离系数基本上是“无标度的”。 Systat 10.2 ’s normalised Euclidean distance produces its “normalisation” by dividing each squared discrepancy between attributes or persons by the total number of squared discrepancies (or sample size). Systat 10.2的标准化欧几里得距离将属性或人物之间的每个平方差异除以平方差异的总数(或样本大小),从而产生其“标准化”。

标准化欧氏距离

Frankly, I can see little point in this standardization – as the final coefficient still remains scale‐sensitive. 坦白地说,我对这种标准化没有多大意义,因为最终系数仍然对比例敏感。 That is, it is impossible to know whether the value indicates high or low dissimilarity from the coefficient value alone 即,不可能仅从系数值就知道该值指示高还是低的相异性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM