繁体   English   中英

使用 long 和 lat 按到不同数据帧中数据点的距离对数据帧进行分组

[英]group dataframe by distance to datapoints in different dataframe using long and lat

我有两个数据帧。 其中一个包含多个发电厂及其各自的经纬度位置,每个位于一列。 另一个数据帧包含几个变电站,也有 long 和 lat。 我要做的是将发电厂分配到最近的变电站。

df1 = pd.DataFrame{'ID_pp':['p1','p2','p3','p4'],'x':[12.644881,11.563269, 12.644881,  8.153184], 'y':[48.099206, 48.020081, 48.099206, 49.153766]}
df2 = pd.DataFrame{'ID_ss':['s1','s2','s3','s4'],'x':[9.269, 9.390, 9.317, 10.061], 'y':[55.037, 54.940, 54.716, 54.349]}

我想我需要计算所有点之间的距离,然后对数据框进行分组,但我不确定如何。 我找到了 numpy.linalg.norm() 函数,但它并不适合我。 任何帮助表示赞赏。

我找到了这个解决方案,这基本上正是我所需要的:

import pandas as pd
import geopy.distance



for i,row in test.iterrows(): # A
    df1 = row.x, row.y
    distances = []
    for j,row2 in df2.iterrows(): # B
        b = row2.x, row2.y
        distances.append(geopy.distance.geodesic(a, b).km)

    min_distance = min(distances)
    min_index = distances.index(min_distance)


    print("A", i, "is closest to B", min_index, min_distance, "km")

它有效,但需要永远,而且我的数据集非常大。 我认为使用 .apply 的方法可能会更快。 有人知道如何将这种方法应用于应用方法吗?

这是使用geopandas的解决方案。 我不知道这在更大的数据集上有多好。

import geopandas as gpd
import pandas as pd

df1 = pd.DataFrame({'ID_pp':['p1','p2','p3','p4'],'x':[12.644881,11.563269, 12.644881,  8.153184], 'y':[48.099206, 48.020081, 48.099206, 49.153766]})
df2 = pd.DataFrame({'ID_ss':['s1','s2','s3','s4'],'x':[9.269, 9.390, 9.317, 10.061], 'y':[55.037, 54.940, 54.716, 54.349]})

# create GeoDataFrames from the original dfs
gdf1 = gpd.GeoDataFrame(df1[['ID_pp']], geometry=gpd.points_from_xy(df1['x'], df1['y']), crs='EPSG:4326')
gdf2 = gpd.GeoDataFrame(df2[['ID_ss']], geometry=gpd.points_from_xy(df2['x'], df2['y']), crs='EPSG:4326')

# convert to another coordinate reference system for units in metres, EPSG:5243 suits Germany as far as I know 
gdf1 = gdf1.to_crs('EPSG:5243')
gdf2 = gdf2.to_crs('EPSG:5243')

gdf2 = gdf2.set_index('ID_ss')

def get_closest_ss(point, other):
    s = other.distance(point)
    return (s.idxmin(), s.min())

# find ID of closest substation to all power plants
gdf1[['closest_ss', 'distance']] = gdf1.geometry.apply(get_closest_ss, args=(gdf2,)).to_list()

# merge the dataframe with the power plants (gdf1) with the closest substation (gdf2)
gdf = gdf1.merge(gdf2, left_on='closest_ss', right_index=True, suffixes=('', '_ss'))

print(gdf)

# output

  ID_pp                         geometry closest_ss       distance  \
0    p1   POINT (159807.847 -320153.333)         s4  717896.945731   
1    p2    POINT (79356.344 -330713.037)         s4  711534.096071   
2    p3   POINT (159807.847 -320153.333)         s4  717896.945731   
3    p4  POINT (-171106.060 -202478.708)         s4  592470.679838   

                     geometry_ss  
0  POINT (-28563.516 372589.227)  
1  POINT (-28563.516 372589.227)  
2  POINT (-28563.516 372589.227)  
3  POINT (-28563.516 372589.227) 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM