简体   繁体   English

Pandas - 列之间的欧几里得距离

[英]Pandas - Euclidean Distance Between Columns

I have a dataframe as follows:我有一个 dataframe 如下:

           uuid        x_1         y_1         x_2         y_2
0        di-ab5      82.31      184.20      148.06      142.54  
1        di-de6      92.35      185.21       24.12       16.45
2        di-gh7     123.45        0.01         NaN         NaN 
...

I am trying to calculate the euclidean distance between [x_1, y_1] and [x_2, y_2] in a new column (not real values in this example).我正在尝试在新列中计算[x_1, y_1][x_2, y_2]之间的欧几里德距离(在此示例中不是实际值)。

           uuid       dist
0        di-ab5      12.31    
1        di-de6      62.35   
2        di-gh7        NaN

Caveats:注意事项:

  1. some rows have NaN on some of the datapoints某些行在某些数据点上有NaN
  2. it is okay to represent data in the original dataframe as points (ie [1.23, 4.56] ) instead of splitting up the x and y coordinates可以将原始 dataframe 中的数据表示为点(即[1.23, 4.56] ),而不是将 x 和 y 坐标分开

I am currently using the following script:我目前正在使用以下脚本:

df['dist']  = np.sqrt((df['x_1'] - df['x_2'])**2 + (df['y_1'] - df['y_2'])**2)

But it seems verbose and often fails.但它看起来很冗长并且经常失败。 Is there a better way to do this using pandas, numpy, or scipy?使用 pandas、numpy 或 scipy 是否有更好的方法来做到这一点?

You can use np.linalg.norm , ie:您可以使用np.linalg.norm ,即:

df['dist'] = np.linalg.norm(df.iloc[:, [1,2]].values - df.iloc[:, [3,4]], axis=1)

Output: Output:

     uuid     x_1     y_1     x_2     y_2        dist
0  di-ab5   82.31  184.20  148.06  142.54   77.837125
1  di-de6   92.35  185.21   24.12   16.45  182.030960
2  di-gh7  123.45    0.01     NaN     NaN         NaN
def getDist( df, a, b ):
    return np.sqrt((df[f'x_{a}']-df[f'x_{b}'])**2+(df[f'y_{a}']-df[f'y_{b}'])**2)
np.sqrt((df.filter(like='x').agg('diff',1).sum(1)**2)+(df.filter(like='y').agg('diff',1).sum(1)**2))

How it works这个怎么运作

Filter x and y respectively分别过滤 x 和 y

df.filter(like='x')

Find the cross column difference and square it.找到跨柱差并将其平方。

df.filter(like='x').agg('diff',1).sum(1)**2

Add the two outcomes together and find the square root.将两个结果相加并求平方根。

np.sqrt((df.filter(like='x').agg('diff',1).sum(1)**2)+(df.filter(like='y').agg('diff',1).sum(1)**2))

Another solution using numpy:使用 numpy 的另一种解决方案:

diff = (df[['x_1','y_1']].to_numpy()-df[['x_2','y_2']].to_numpy())
df['dist'] = np.sqrt((diff*diff).sum(-1))

output: output:

    uuid    x_1     y_1     x_2     y_2     dist
0   di-ab5  82.31   184.20  148.06  142.54  77.837125
1   di-de6  92.35   185.21  24.12   16.45   182.030960
2   di-gh7  123.45  0.01    NaN     NaN     NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM