欧氏距离的python数据框矩阵

Question

I would like to create an own customized k nearest neighbor method. 我想创建一个自定义的k最近邻居方法。

For this I would need a matrix (x : y) which returns the distance for each combination of x and y for a given function (eg euclidean based on 7 items of my dataset). 为此，我需要一个矩阵（x：y），该矩阵返回给定函数（例如，基于我的数据集的7个项的欧几里得）的x和y每种组合的距离。

eg 例如

data:
   x1  x2  x3
  row 1:  1   2   3
  row 2:  1   1   1 
  row 3:  4   2   3

if I select x1 and x2 and euclidean, then the output should be a 3x3 output 如果我选择x1和x2以及euclidean，那么输出应该是3x3输出

1:1=0
1:2 =sqrt((1-1)^2+(2-1)^2)=1
1:3 =sqrt((1-4)^2+(2-2)^2)=sqrt(3)
2:1=1:2=1
2:2=0
2:3=sqrt((1-4)^2+(1-2)^2)=2
3:3=0

and so forth... 等等...

how to write that without iterating through the dataframe? 如何编写而不迭代数据帧？

Thanks in advance for your support. 预先感谢您的支持。

Answer 1

You can use scipy.spatial.distance.pdist and scipy.spatial.distance.squareform : 您可以使用scipy.spatial.distance.pdist和scipy.spatial.distance.squareform ：

from scipy.spatial.distance import pdist, squareform

dist = pdist(df[['x1', 'x2']], 'euclidean')
df_dist = pd.DataFrame(squareform(dist))

If you just want an array as your output, and not a DataFrame, just use squareform by itself, without wrapping it in a DataFrame. 如果你只是想一个数组作为输出，而不是一个数据帧，只使用squareform本身，而无需在数据帧加以包装。

The resulting output (as a DataFrame): 结果输出（作为DataFrame）：

     0         1         2
0  0.0  1.000000  3.000000
1  1.0  0.000000  3.162278
2  3.0  3.162278  0.000000

欧氏距离的python数据框矩阵

问题描述

1 个解决方案

解决方案1
5 已采纳 2016-11-29 17:00:50

欧氏距离的python数据框矩阵

问题描述

1 个解决方案

解决方案1 5 已采纳 2016-11-29 17:00:50

解决方案1
5 已采纳 2016-11-29 17:00:50