简体   繁体   English

如何在熊猫数据框中选择地理区域内的对象

[英]How do i select objects within a geographic regions in a pandas dataframe

I'm trying to selection objects within a region from a pandas dataframe which contains a list of item ids and lat lon pairs. 我正在尝试从包含项目ID和纬度对列表的pandas数据框中选择区域内的对象。 Is there a selection method for this? 是否有选择方法? I think this would be similar to this SO question but using PANDAS instead of SQL 我认为这将类似于此问题,但使用PANDAS而不是SQL

Selecting geographical points within area 选择区域内的地理位置

Here is my table saved in locations.csv 这是我的表格保存在locations.csv中

ID, LAT, LON
001,35.00,-75.00
002,35.01,-80.00 
...
999,25.76,-64.00

I can load the dataframe, and select a rectangular region: 我可以加载数据框,然后选择一个矩形区域:

import pandas as pd
df = pd.read_csv('locations.csv', delimiter=',')
lat_max = 32.323496
lat_min = 25.712767
lon_max = -72.863358
lon_min = -74.729456
small_df = df[df['LAT'] > lat_min][df['LAT'] < lat_max][df['LON'] < lon_max][df['LON'] > lon_min]

How would I select objects within an irregular region? 如何在不规则区域内选择对象?

How would I structure the dataframe selection command? 如何构造数据框选择命令?

I can build a lambda function that will produce a True value for LAT and LON within the region but I'm not sure how to use that with a pandas dataframe. 我可以构建一个lambda函数,该函数将在该区域内为LAT和LON生成True值,但是我不确定如何将其与pandas数据框一起使用。

A process to select points within a region as performed by the working code below starts with creating 2 geodataframes. 下面的工作代码执行的在区域内选择点的过程始于创建2个地理数据框。 The first one contains a polygon, and the second contains all the points to do spatial join with the first. 第一个包含多边形,第二个包含与第一个进行spatial join所有点。 The spatial join operator within is used to enable the points that fall inside the polygon to be selected. 空间联接运算符within用于启用落在多边形内的点被选择。 The result of the operation is also a geodataframe, it contains only the required points that fall within the area of the polygon. 操作的结果也是一个地理数据框,它仅包含落在多边形区域内的所需点。

The content of locations.csv ; locations.csv的内容; 6 lines with column headers. 6行,带列标题。 Note: no spaces in the first row. 注意:第一行中没有空格。

ID,LAT,LON
1, 15.1, 10.0
2, 15.2, 15.1
3, 15.3, 20.2
4, 15.4, 25.3
5, 15.5, 30.4

The code: 编码:

import pandas as pd
import geopandas as gpd
from shapely import wkt
from shapely.geometry import Point, Polygon
from shapely.wkt import loads

# Create a geo-dataframe `polygon_df` having 1 row of polygon
# This polygon will be used to select points in a geodataframe
d = {'poly_id':[1], 'wkt':['POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))']}
df = pd.DataFrame( data=d )
geometry = [loads(pgon) for pgon in df.wkt]
polygon_df = gpd.GeoDataFrame(df, \
                   crs={'init': 'epsg:4326'}, \
                   geometry=geometry)

# One can plot this polygon with the command:
# polygon_df.plot()

# Read the file with `pandas`
locs = pd.read_csv('locations.csv', sep=',')

# Making it a geo-dataframe with new name: `geo_locs`
geo_locs = gpd.GeoDataFrame(locs, crs={'init': 'epsg:4326'})
locs_geom = [Point(xy) for xy in zip(geo_locs.LON, geo_locs.LAT)]
geo_locs['wkt'] = geo_locs.apply( lambda x: Point(x.LON, x.LAT), axis=1 )
geo_locs = gpd.GeoDataFrame(geo_locs, crs={'init': 'epsg:4326'}, \
    geometry=geo_locs['wkt'])

# Do a spatial join of `point` within `polygon`, get the result in `pts_in_poly` GeodataFrame.
pts_in_poly = gpd.sjoin(geo_locs, polygon_df, op='within', how='inner')

# Print the ID of the points that fall within the polygon.
print(pts_in_poly.ID)

# The output will be:
#2    3
#3    4
#4    5
#Name: ID, dtype: int64

# Plot the polygon and all the points.
ax1 = polygon_df.plot(color='lightgray', zorder=1)
geo_locs.plot(ax=ax1, zorder=5, color="red")

The output plot: 输出图:

在此处输入图片说明

In the plot, the points with ID's 3, 4, and 5 fall within the polygon. 在图中,ID为3、4和5的点落在多边形内。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM