I am quite new to Python. I have the following table in Postgres. These are Polygon values with four coordinates with same Id
with ZONE
name I have stored this data in Python dataframe called df1
Id Order Lat Lon Zone
00001 1 50.6373473 3.075029928 A
00001 2 50.63740441 3.075068636 A
00001 3 50.63744285 3.074951754 A
00001 4 50.63737839 3.074913884 A
00002 1 50.6376054 3.0750528 B
00002 2 50.6375896 3.0751209 B
00002 3 50.6374239 3.0750246 B
00002 4 50.6374404 3.0749554 B
I have Json data with Lon
and Lat
values and I have stored them is python dataframe called df2
.
Lat Lon
50.6375524099 3.07507914474
50.6375714407 3.07508201591
My task is to compare df2
Lat
and Lon
values with four coordinates of each zone in df1
to extract the zone name and add it to df2
.
For instance (50.637552409 3.07507914474)
belongs to Zone B
.
#This is ID with Zone
df1 = pd.read_sql_query("""SELECT * from "zmap" """,con=engine)
#This is with lat,lon values
df2 = pd.read_sql_query("""SELECT * from "E1" """,con=engine)
df2['latlon'] = zip(df2.lat, df2.lon)
zones = [
["A", [[50.637347297, 3.075029928], [50.637404408, 3.075068636], [50.637442847, 3.074951754],[50.637378390, 3.074913884]]]]
for i in range(0, len(zones)): # for each zone points
X = mplPath.Path(np.array(zones[i][1]))
# find if points are Zones
Y= X.contains_points(df2.latlon.values.tolist())
# Label points that are in the current zone
df2[Y, 'zone'] = zones[i][0]
Currently I have done it manually for Zone 'A'. I need to generate the "Zones" for the coordinates in df2.
This sounds like a good use case for scipy cdist , also discussed here .
import pandas as pd
from scipy.spatial.distance import cdist
data1 = {'Lat': pd.Series([50.6373473,50.63740441,50.63744285,50.63737839,50.6376054,50.6375896,50.6374239,50.6374404]),
'Lon': pd.Series([3.075029928,3.075068636,3.074951754,3.074913884,3.0750528,3.0751209,3.0750246,3.0749554]),
'Zone': pd.Series(['A','A','A','A','B','B','B','B'])}
data2 = {'Lat': pd.Series([50.6375524099,50.6375714407]),
'Lon': pd.Series([3.07507914474,3.07508201591])}
def closest_point(point, points):
""" Find closest point from a list of points. """
return points[cdist([point], points).argmin()]
def match_value(df, col1, x, col2):
""" Match value x from col1 row to value in col2. """
return df[df[col1] == x][col2].values[0]
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df1['point'] = [(x, y) for x,y in zip(df1['Lat'], df1['Lon'])]
df2['point'] = [(x, y) for x,y in zip(df2['Lat'], df2['Lon'])]
df2['closest'] = [closest_point(x, list(df1['point'])) for x in df2['point']]
df2['zone'] = [match_value(df1, 'point', x, 'Zone') for x in df2['closest']]
print(df2)
# Lat Lon point closest zone
# 0 50.637552 3.075079 (50.6375524099, 3.07507914474) (50.6375896, 3.0751209) B
# 1 50.637571 3.075082 (50.6375714407, 3.07508201591) (50.6375896, 3.0751209) B
note that the current title of the post Find closest point in Pandas DataFrames
but OP's attempt shows that they are looking for the zone within which a point is found.
It is possible to leverage the geopandas library to do this operation elegantly & efficiently.
Convert the DataFrame into a GeoDataFrame.
Then aggregate the points in df1
to create a polygon. The aggregation operation is called dissolve
.
Finally, use a spatial join sjoin
with the predicate such that points in df2 are covered by the polygon representing a Zone
in zones
and output the Lat,
Lon &
Zone` columns.
# set up
import pandas as pd
import geopandas as gpd
df1 = pd.DataFrame({
'Id': [1, 1, 1, 1, 2, 2, 2, 2],
'Order': [1, 2, 3, 4, 1, 2, 3, 4],
'Lat': [50.6373473, 50.63740441, 50.63744285, 50.63737839, 50.6376054, 50.6375896, 50.6374239, 50.6374404],
'Lon': [3.075029928, 3.075068636, 3.074951754, 3.074913884, 3.0750528, 3.0751209, 3.0750246, 3.0749554],
'Zone': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B']
})
df2 = pd.DataFrame({
'Lat': [50.6375524099, 50.6375714407],
'Lon': [3.07507914474, 3.07508201591]
})
# convert to GeoDataFrame
df1 = gpd.GeoDataFrame(df1, geometry=gpd.points_from_xy(df1.Lon, df1.Lat))
df2 = gpd.GeoDataFrame(df2, geometry=gpd.points_from_xy(df2.Lon, df2.Lat))
# aggregate & merge
zones = df1.dissolve(by='Zone').convex_hull.rename('geometry').reset_index()
merged = df2.sjoin(zones, how='left', predicate='covered_by')
# output
output_columns = ['Lat', 'Lon', 'Zone']
merged[output_columns]
this outputs:
Lat Lon Zone
0 50.637552 3.075079 B
1 50.637571 3.075082 B
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.