简体   繁体   English

使用 Geopandas 计算到最近特征的距离

[英]Calculate Distance to Nearest Feature with Geopandas

I'm looking to do the equivalent of the ArcPy Generate Near Table using Geopandas / Shapely.我正在寻找使用 Geopandas / Shapely 做相当于 ArcPy Generate Near Table 的工作 I'm very new to Geopandas and Shapely and have developed a methodology that works, but I'm wondering if there is a more efficient way of doing it.我对 Geopandas 和 Shapely 很陌生,并且开发了一种有效的方法,但我想知道是否有更有效的方法。

I have two point file datasets - Census Block Centroids and restaurants.我有两个点文件数据集 - 人口普查块质心和餐馆。 I'm looking to find, for each Census Block centroid, the distance to it's closest restaurant.我正在寻找,对于每个人口普查区块质心,到它最近的餐厅的距离。 There are no restrictions in terms of same restaurant being the closest restaurant for multiple blocks.同一餐厅是多个街区最近的餐厅没有限制。

The reason this becomes a bit more complicated for me is because the Geopandas Distance function calculates elementwise, matching based on index.这对我来说变得有点复杂的原因是因为Geopandas 距离函数根据索引计算元素匹配。 Therefore, my general methodology is to turn the Restaurants file into a multipoint file and then set the index of the blocks file to all be the same value.因此,我的一般方法是将餐厅文件变成多点文件,然后将块文件的索引设置为相同的值。 Then all of the block centroids and the restaurants have the same index value.然后所有的块质心和餐馆都具有相同的索引值。

import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon, Point, MultiPoint

Now read in the Block Centroid and Restaurant Shapefiles:现在阅读块质心和餐厅形状文件:

Blocks=gpd.read_file(BlockShp)
Restaurants=gpd.read_file(RestaurantShp)

Since the Geopandas distance function calculates distance elementwise, I convert the Restaurant GeoSeries to a MultiPoint GeoSeries:由于 Geopandas 距离函数按元素计算距离,因此我将 Restaurant GeoSeries 转换为 MultiPoint GeoSeries:

RestMulti=gpd.GeoSeries(Restaurants.unary_union)
RestMulti.crs=Restaurants.crs
RestMulti.reset_index(drop=True)

Then I set the index for the Blocks equal to 0 (the same value as the Restaurants multipoint) as a work around for the elementwise calculation.然后我将 Blocks 的索引设置为等于 0(与餐馆多点的值相同)作为元素计算的解决方法。

Blocks.index=[0]*len(Blocks)

Lastly, I use the Geopandas distance function to calculate the distance to the nearest restaurant for each Block centroid.最后,我使用 Geopandas 距离函数来计算每个 Block 质心到最近餐厅的距离。

Blocks['Distance']=Blocks.distance(RestMulti)

Please offer any suggestions on how any aspect of this could be improved.请就如何改进这方面的任何方面提出任何建议。 I'm not tied to using Geopandas or Shapely, but I am looking to learn an alternative to ArcPy.我与使用 Geopandas 或 Shapely 无关,但我正在寻找 ArcPy 的替代方案。

Thanks for the help!谢谢您的帮助!

If I understand correctly your issue, Blocks and Restaurants can have very different dimensions.如果我正确理解您的问题,街区和餐厅可以有非常不同的维度。 For this reason, it's probably a bad approach to try to force into a table format by reindexing.出于这个原因,尝试通过重新索引来强制转换为表格格式可能是一种不好的方法。

I would just loop over blocks and get the minimum distance to restaurants (just as @shongololo was suggesting).我只会遍历街区并获得到餐馆的最小距离(正如@shongololo 所建议的那样)。

I'm going to be slightly more general (because I already have this code written down) and do a distance from points to lines, but the same code should work from points to points or from polygons to polygons.我会稍微通用一点(因为我已经写下了这段代码)并且从点到线做一段距离,但是相同的代码应该从点到点或从多边形到多边形工作。 I'll start with a GeoDataFrame for the points and I'll create a new column which has the minimum distance to lines.我将从点的GeoDataFrame开始,然后创建一个与线的距离最小的新列。

%matplotlib inline
import matplotlib.pyplot as plt
import shapely.geometry as geom
import numpy as np
import pandas as pd
import geopandas as gpd

lines = gpd.GeoSeries(
    [geom.LineString(((1.4, 3), (0, 0))),
        geom.LineString(((1.1, 2.), (0.1, 0.4))),
        geom.LineString(((-0.1, 3.), (1, 2.)))])

# 10 points
n  = 10
points = gpd.GeoSeries([geom.Point(x, y) for x, y in np.random.uniform(0, 3, (n, 2))])

# Put the points in a dataframe, with some other random column
df_points = gpd.GeoDataFrame(np.array([points, np.random.randn(n)]).T)
df_points.columns = ['Geometry', 'Property1']

points.plot()
lines.plot()

在此处输入图片说明

Now get the distance from points to lines and only save the minimum distance for each point (see below for a version with apply)现在获取点到线的距离,并只保存每个点的最小距离(请参阅下面的应用版本)

min_dist = np.empty(n)
for i, point in enumerate(points):
    min_dist[i] = np.min([point.distance(line) for line in lines])
df_points['min_dist_to_lines'] = min_dist
df_points.head(3)

which gives这使

    Geometry                                       Property1    min_dist_to_lines
0   POINT (0.2479424516236574 2.944916965334865)    2.621823    0.193293
1   POINT (1.465768457667432 2.605673714922998)     0.6074484   0.226353
2   POINT (2.831645235202689 1.125073838462032)     0.657191    1.940127

---- EDIT ---- - - 编辑 - -

(taken from a github issue) Using apply is nicer and more consistent with how you'd do it in pandas : (取自 github 问题)使用apply更好,更符合您在pandas做法:

def min_distance(point, lines):
    return lines.distance(point).min()

df_points['min_dist_to_lines'] = df_points.geometry.apply(min_distance, df_lines)

EDIT: As of at least 2019-10-04 it seems that a change in pandas requires a different input in the last code block, making use of the args parameters in .apply() :编辑:至少从 2019-10-04 开始,pandas 的变化似乎需要在最后一个代码块中使用不同的输入,利用.apply()中的args参数:

df_points['min_dist_to_lines'] = df_points.geometry.apply(min_distance, args=(df_lines,))

I will use two sample datasets in geopandas with different dimensions to demonstrate.我将在具有不同维度的 geopandas 中使用两个示例数据集来演示。

import geopandas as gpd

# read geodata for five nyc boroughs
gdf_nyc = gpd.read_file(gpd.datasets.get_path('nybb'))
# read geodata for international cities
gdf_cities = gpd.read_file(gpd.datasets.get_path('naturalearth_cities'))

# convert to a meter projection
gdf_nyc.to_crs(epsg=3857, inplace=True)
gdf_cities.to_crs(epsg=3857, inplace=True)

We can simply apply a lambda function to the GeoSeries.我们可以简单地将 lambda 函数应用于 GeoSeries。 For example, if we want to get the minimal distance between each NYC borough (polygon) and their nearest international city (point).例如,如果我们想要获得每个纽约市行政区(多边形)与其最近的国际城市(点)之间的最小距离。 We can do the following:我们可以执行以下操作:

gdf_nyc.geometry.apply(lambda x: gdf_cities.distance(x).min())

This will give us这会给我们

0    384422.953323
1    416185.725507
2    412520.308816
3    419511.323677
4    440292.945096
Name: geometry, dtype: float64

Similarly, if we want the minimal distance between each international city and their nearest NYC borough.同样,如果我们想要每个国际城市与其最近的纽约市行政区之间的最小距离。 We can do the following:我们可以执行以下操作:

gdf_cities.geometry.apply(lambda x: gdf_nyc.distance(x).min())

This will give us这会给我们

0      9.592104e+06
1      9.601345e+06
2      9.316354e+06
3      8.996945e+06
4      2.614927e+07
           ...     
197    1.177410e+07
198    2.377188e+07
199    8.559704e+06
200    8.902146e+06
201    2.034579e+07
Name: geometry, Length: 202, dtype: float64

Notes:笔记:

  1. Before calculating distance, covert your GeoDataFrame to a Cartesian projection.在计算距离之前,将您的 GeoDataFrame 转换为笛卡尔投影。 In the example, I used epsg:3857 , so the distance will be in meters.在示例中,我使用了epsg:3857 ,因此距离将以米为单位。 If you use an ellipsoidal (lon/lat based) projection, the result will be degrees.如果您使用椭球(基于 lon/lat 的)投影,结果将是度数。 Converting your projection first before anything else such as getting the centroids of your polygons.先转换您的投影,然后再进行其他操作,例如获取多边形的质心。
  2. There is only one distance between two points.两点之间只有一个距离。 The minimal distance returned by the .distance() method will make sense when you want to get the distance, let say, between a point and a line.当您想要获得点和线之间的距离时, .distance()方法返回的最小距离将是有意义的。 In other words, .distance() method can calculate distance between any two geo-objects.换句话说, .distance()方法可以计算任何两个地理对象之间的距离。
  3. When you have more than one geometry columns in a GeoDataFrame, make sure to apply the lambda function to the desired GeoSeries and also call the .distance() method from the desired GeoSeries.当 GeoDataFrame 中有多个geometry列时,请确保将 lambda 函数应用于所需的 GeoSeries,并从所需的 GeoSeries 调用.distance()方法。 In the example, I called the method from the GeoDataFrame directly because both of them only have one GeoSeries column.在示例中,我直接从 GeoDataFrame 调用该方法,因为它们都只有一个 GeoSeries 列。

Your code is missing a detail, args = (df_lines)您的代码缺少细节, args = (df_lines)

def min_distance(point, lines):
    return lines.distance(point).min()

df_points['min_dist_to_lines'] = df_points.geometry.apply(min_distance, args=(df_lines,))# Notice the change to this line

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM