Pandas：最節省資源的申請方式 function

Question

我有兩個數據框，一個包含帶點的列，另一個包含多邊形。 數據如下所示：

>>> df1
   Index            Point
0      1  POINT (100 400)
1      2  POINT (920 400)
2      3  POINT (111 222)

>>> df2
   Index    Area-ID                                            Polygon
0      1   New York  POLYGON ((226000 619000, 226000 619500, 226500...
1      2  Amsterdam  POLYGON ((226000 619000, 226000 619500, 226500...
2      3     Berlin  POLYGON ((226000 619000, 226000 619500, 226500...

可重現的例子：

import pandas as pd
import shapely.wkt

data = {'Index': [1, 2, 3],
        'Point': ['POINT (100 400)', 'POINT (920 400)', 'POINT (111 222)']}
df1 = pd.DataFrame(data)
df1['Point'] = df1['Point'].apply(shapely.wkt.loads)

data = {'Index': [1, 2, 3],
        'Area-ID': ['New York', 'Amsterdam', 'Berlin'],
        'Polygon': ['POLYGON ((90 390, 110 390, 110 410, 90 410, 90 390))',
                    'POLYGON ((890 390, 930 390, 930 410, 890 410, 890 390))',
                    'POLYGON ((110 220, 112 220, 112 225, 110 225, 110 220))']}
df2 = pd.DataFrame(data)
df2['Polygon'] = df2['Polygon'].apply(shapely.wkt.loads)

使用 shapely 的 function 'polygon.contains' 我可以檢查多邊形是否包含某個點。 目標是為 dataframe 1 中的每個點找到對應的多邊形。

以下方法可行，但考慮到數據集非常大，花費的時間太長：

for index, row in dataframe1.iterrows():
    print(index)
    for index, row2 in dataframe2.iterrows():
        if row2['Polygon'].contains(row[Point']):
            dataframe1.iloc[index]['Area-ID'] = row2['Area-ID']

有沒有更省時的方法來實現這個目標？

Answer 1

如果每個點都包含在一個多邊形中（就像問題的當前形式一樣），您可以執行以下操作：

df1=\
df1.assign(cities=df1.Point.apply(lambda point:
                                    df2['Area-ID'].loc[
                                        [i for i, polygon in enumerate(df2.Polygon)
                                        if polygon.contains(point)][0]
                                        ]))

你會得到：

   Index            Point     cities
0      1  POINT (100 400)   New York
1      2  POINT (920 400)  Amsterdam
2      3  POINT (111 222)     Berlin

Pandas：最節省資源的申請方式 function

問題描述

1 個解決方案

解決方案1
1 已采納 2021-09-28 16:11:03

Pandas：最節省資源的申請方式 function

問題描述

1 個解決方案

解決方案1 1 已采納 2021-09-28 16:11:03

解決方案1
1 已采納 2021-09-28 16:11:03