繁体   English   中英

匹配两个数据框之间的地理坐标

[英]Matching geographic coordinates between two data frames

我有两个具有经度和纬度列的数据框。 DF1 和 DF2:

DF1 = pd.DataFrame([[19.827658,-20.372238,8614], [19.825407,-20.362608,7412], [19.081514,-17.134456,8121]], columns=['Longitude1', 'Latitude1','Echo_top_height'])
DF2 = pd.DataFrame([[19.083727, -17.151207, 285.319994], [19.169403, -17.154144, 284.349994], [19.081514,-17.154456, 285.349994]], columns=['Longitude2', 'Latitude2','BT'])

在此处输入图像描述

在此处输入图像描述

我需要在 DF1 中找到一个匹配的 long 和 lat 与 DF2 中的 long 和 lat。 并且在数据匹配的情况下,将 BT 列中的对应值从 DF2 添加到 DF1。

我使用了这里的代码并设法检查是否有匹配项:

from sklearn.metrics.pairwise import haversine_distances
threshold = 5000 # meters
earth_radius = 6371000  # meters
DF1['nearby'] = (
# get the distance between all points of each DF
haversine_distances(
    # note that you need to convert to radiant with *np.pi/180
    X=DF1[['Latitude1','Longitude1']].to_numpy()*np.pi/180, 
    Y=DF2[['Latitude2','Longitude2']].to_numpy()*np.pi/180)
*earth_radius < threshold).any(axis=1).astype(int)

所以我需要的结果是这样的:

Longitude1 Latitude1 Echo_top_height   BT
19.82       -20.37       8614         290.345
19.82       -20.36       7412         289.235
and so on...

您可以使用BallTree

# Update: for newer versions of sklearn
from sklearn.neighbors import BallTree
from sklearn.metrics import DistanceMetric
# from sklearn.neighbors import BallTree, DistanceMetric

# DF1
coords = np.radians(df2[['Latitude2', 'Longitude2']])
dist = DistanceMetric.get_metric('haversine')
tree = BallTree(coords, metric=dist)

# DF2
coords = np.radians(df1[['Latitude1', 'Longitude1']])
distances, indices = tree.query(coords, k=1)
df1['BT'] = df2['BT'].iloc[indices.flatten()].values
df1['Distance'] = distances.flatten()

输出:

经度1 纬度1 echo_top_height 英国电信 距离
19.8277 -20.3722 8614 284.35 0.0572097
19.8254 -20.3626 7412 284.35 0.0570377
19.0815 -17.1345 8121 285.32 0.000294681

看起来您正在按索引比较数据帧,因此您可以使用join并删除不必要的行和列:

DF3 = DF1.join(DF2[['BT']])
DF3 = DF3[DF3['nearby'].eq(1)].drop('nearby', axis=1)
DF3

完整的可重现代码:

import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import haversine_distances
DF1 = pd.DataFrame([[19.827658,-20.372238,8614], [19.825407,-20.362608,7412], [19.081514,-17.134456,8121]], columns=['Longitude1', 'Latitude1','Echo_top_height'])
DF2 = pd.DataFrame([[19.083727, -17.151207, 285.319994], [19.169403, -17.154144, 284.349994], [19.081514,-17.154456, 285.349994]], columns=['Longitude2', 'Latitude2','BT'])
DF1, DF2
threshold = 5000 # meters
earth_radius = 6371000  # meters
DF1['nearby'] = (
# get the distance between all points of each DF
haversine_distances(
    # note that you need to convert to radiant with *np.pi/180
    X=DF1[['Latitude1','Longitude1']].to_numpy()*np.pi/180, 
    Y=DF2[['Latitude2','Longitude2']].to_numpy()*np.pi/180)
*earth_radius < threshold).any(axis=1).astype(int)

DF3 = DF1.join(DF2[['BT']])
DF3 = DF3[DF3['nearby'].eq(1)].drop('nearby', axis=1)
DF3

输出:

Out[1]: 
   Longitude1  Latitude1  Echo_top_height          BT
2   19.081514 -17.134456             8121  285.349994

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM