简体   繁体   English

计算两个数据帧的距离并生成交叉距离矩阵,并在 python 中找到最近的位置

[英]Calculate distance b/w two data frames and result into a cross distance matrix and find nearest location in python

The data is as follows:数据如下:

import pandas as pd
city_data = {'City': ['Delhi', 'Mumbai'],
        'Lat': [28.7041, 19.0760,],
        'Long':[77.1025,72.8777] }
person_data = {'City': ['A', 'B'],
        'Lat': [12.9716, 13.0827,],
        'Long':[77.5946,80.2707] }
df_city = pd.DataFrame(city_data)
df_person = pd.DataFrame(person_data)

Output-1 Needed需要输出 1 在此处输入图像描述

Output-2 Needed需要输出 2 在此处输入图像描述

The distance took haversine distance calculation.距离采用偏正距离计算。 There are 1000+ people and 300+ locations有 1000 多人和 300 多个地点

Here's a way of doing this using scipy.metrics.pairwise.haversine_distaces , which does a pairwise-computation for each pair of coordinates.这是一种使用scipy.metrics.pairwise.haversine_distaces执行此操作的方法,它对每对坐标进行成对计算。 Note that I added another city so that the two lists would be different sizes, just to make sure the array coordinates were in the right order:请注意,我添加了另一个城市,以便两个列表的大小不同,只是为了确保数组坐标的顺序正确:

import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import haversine_distances

city_data = {'City': ['Delhi', 'Mumbai','Jakarta'],
        'Lat': [28.7041, 19.0760,6.175],
        'Long':[77.1025,72.8777,106.8275] }
person_data = {'Person': ['A', 'B'],
        'Lat': [12.9716, 13.0827,],
        'Long':[77.5946,80.2707] }
df_city = pd.DataFrame(city_data)
df_person = pd.DataFrame(person_data)
print(df_city)
print()
print(df_person)
print()

# Extract as arrays and convert to radians.

c1 = np.radians(df_city[['Lat','Long']].to_numpy())
c2 = np.radians(df_person[['Lat','Long']].to_numpy())

# Compute distances in kilometers.

dist = haversine_distances(c2, c1) * 6371000/1000
print(dist)
print()


# Convert back to dataframe.

df = pd.DataFrame( dist, columns=df_city['City'], index=df_person['Person'])
print(df)
print()

# Sort the data and return the indexes of the closest two.

distsort = dist.argsort(axis=1)[:,:2]

# Look those up in the city names.

distsort = df_city['City'].to_numpy()[distsort]

df2 = pd.DataFrame( distsort, columns=['Closest','Next'], index=df_person['Person'])
print(df2)

Output: Output:

      City      Lat      Long
0    Delhi  28.7041   77.1025
1   Mumbai  19.0760   72.8777
2  Jakarta   6.1750  106.8275

  Person      Lat     Long
0      A  12.9716  77.5946
1      B  13.0827  80.2707

[[1750.11476241  845.31838566 3290.21557611]
 [1767.65141115 1033.09851229 3008.41699612]]

City          Delhi       Mumbai      Jakarta
Person                                       
A       1750.114762   845.318386  3290.215576
B       1767.651411  1033.098512  3008.416996

       Closest   Next
Person               
A       Mumbai  Delhi
B       Mumbai  Delhi

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM