简体   繁体   English

在 Pandas DataFrames 中找到最近点

[英]Find closest point in Pandas DataFrames

I am quite new to Python. I have the following table in Postgres.我对 Python 很陌生。我在 Postgres 中有下表。 These are Polygon values with four coordinates with same Id with ZONE name I have stored this data in Python dataframe called df1这些是具有四个坐标的多边形值,具有相同的IdZONE名称我已将此数据存储在 Python dataframe 中,称为df1

Id  Order   Lat              Lon            Zone
00001   1   50.6373473  3.075029928          A
00001   2   50.63740441 3.075068636          A
00001   3   50.63744285 3.074951754          A 
00001   4   50.63737839 3.074913884          A 
00002   1   50.6376054  3.0750528            B
00002   2   50.6375896  3.0751209            B
00002   3   50.6374239  3.0750246            B
00002   4   50.6374404  3.0749554            B

I have Json data with Lon and Lat values and I have stored them is python dataframe called df2 .我有 Json 数据,其中包含LonLat值,我将它们存储为 python dataframe 称为df2

Lat                  Lon
50.6375524099   3.07507914474
50.6375714407   3.07508201591

My task is to compare df2 Lat and Lon values with four coordinates of each zone in df1 to extract the zone name and add it to df2 .我的任务是将df2 LatLon值与df1中每个区域的四个坐标进行比较,以提取区域名称并将其添加到df2

For instance (50.637552409 3.07507914474) belongs to Zone B .例如(50.637552409 3.07507914474)属于Zone B

#This is ID with Zone
df1 = pd.read_sql_query("""SELECT * from "zmap" """,con=engine)
#This is with lat,lon values
df2 = pd.read_sql_query("""SELECT * from "E1" """,con=engine)
df2['latlon'] = zip(df2.lat, df2.lon)
zones = [
["A", [[50.637347297, 3.075029928], [50.637404408, 3.075068636], [50.637442847, 3.074951754],[50.637378390, 3.074913884]]]]
for i in range(0, len(zones)):  # for each zone points
    X = mplPath.Path(np.array(zones[i][1]))
    # find if points are Zones
    Y= X.contains_points(df2.latlon.values.tolist())
    # Label points that are in the current zone
    df2[Y, 'zone'] = zones[i][0]

Currently I have done it manually for Zone 'A'.目前我已经为“A”区手动完成了它。 I need to generate the "Zones" for the coordinates in df2.我需要为 df2 中的坐标生成“区域”。

This sounds like a good use case for scipy cdist , also discussed here . 这听起来像是scipy cdist好用例 ,在这里也进行了讨论。

import pandas as pd
from scipy.spatial.distance import cdist


data1 = {'Lat': pd.Series([50.6373473,50.63740441,50.63744285,50.63737839,50.6376054,50.6375896,50.6374239,50.6374404]),
         'Lon': pd.Series([3.075029928,3.075068636,3.074951754,3.074913884,3.0750528,3.0751209,3.0750246,3.0749554]),
         'Zone': pd.Series(['A','A','A','A','B','B','B','B'])}

data2 = {'Lat': pd.Series([50.6375524099,50.6375714407]),
         'Lon': pd.Series([3.07507914474,3.07508201591])}


def closest_point(point, points):
    """ Find closest point from a list of points. """
    return points[cdist([point], points).argmin()]

def match_value(df, col1, x, col2):
    """ Match value x from col1 row to value in col2. """
    return df[df[col1] == x][col2].values[0]


df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

df1['point'] = [(x, y) for x,y in zip(df1['Lat'], df1['Lon'])]
df2['point'] = [(x, y) for x,y in zip(df2['Lat'], df2['Lon'])]

df2['closest'] = [closest_point(x, list(df1['point'])) for x in df2['point']]
df2['zone'] = [match_value(df1, 'point', x, 'Zone') for x in df2['closest']]

print(df2)
#    Lat        Lon       point                           closest                  zone
# 0  50.637552  3.075079  (50.6375524099, 3.07507914474)  (50.6375896, 3.0751209)  B
# 1  50.637571  3.075082  (50.6375714407, 3.07508201591)  (50.6375896, 3.0751209)  B

note that the current title of the post Find closest point in Pandas DataFrames but OP's attempt shows that they are looking for the zone within which a point is found.请注意帖子的当前标题Find closest point in Pandas DataFrames但 OP 的尝试表明他们正在寻找找到点的区域。

It is possible to leverage the geopandas library to do this operation elegantly & efficiently.可以利用 geopandas 库优雅高效地执行此操作。

Convert the DataFrame into a GeoDataFrame.将 DataFrame 转换为 GeoDataFrame。

Then aggregate the points in df1 to create a polygon.然后聚合df1中的点以创建多边形。 The aggregation operation is called dissolve .聚合操作称为dissolve

Finally, use a spatial join sjoin with the predicate such that points in df2 are covered by the polygon representing a Zone in zones and output the Lat, Lon & Zone` columns.最后,使用带有谓词的空间连接sjoin ,使得 df2 中的点被表示Zone中的zones的多边形和 output Lat,经度&区域列覆盖。

# set up
import pandas as pd
import geopandas as gpd

df1 = pd.DataFrame({
  'Id': [1, 1, 1, 1, 2, 2, 2, 2],
  'Order': [1, 2, 3, 4, 1, 2, 3, 4],
  'Lat': [50.6373473, 50.63740441, 50.63744285, 50.63737839, 50.6376054, 50.6375896, 50.6374239, 50.6374404], 
  'Lon': [3.075029928, 3.075068636, 3.074951754, 3.074913884, 3.0750528, 3.0751209, 3.0750246, 3.0749554],
 'Zone': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B']
})

df2 = pd.DataFrame({
  'Lat': [50.6375524099, 50.6375714407],
  'Lon': [3.07507914474, 3.07508201591] 
})

# convert to GeoDataFrame
df1 = gpd.GeoDataFrame(df1, geometry=gpd.points_from_xy(df1.Lon, df1.Lat))
df2 = gpd.GeoDataFrame(df2, geometry=gpd.points_from_xy(df2.Lon, df2.Lat))

# aggregate & merge
zones = df1.dissolve(by='Zone').convex_hull.rename('geometry').reset_index()
merged = df2.sjoin(zones, how='left', predicate='covered_by')

# output
output_columns = ['Lat', 'Lon', 'Zone']
merged[output_columns]

this outputs:这输出:

         Lat       Lon Zone
0  50.637552  3.075079    B
1  50.637571  3.075082    B

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM