简体   繁体   English

python Point-In-Polygon操作。 根据网格内的点将网格数据与点数据连接起来

[英]python Point-In-Polygon operation. Join gridded data with point data, based on those points falling inside the grid

I want to know how to select values from the xarray DataArray based on the location ( geo_df.geometry ) and time ( geo_df.plant_date & geo_df.cut_date ) of rows in the geopandas GeoDataFrame . 我想知道如何从xarray选择值DataArray基于位置( geo_df.geometry )和时间( geo_df.plant_dategeo_df.cut_date行)在geopandas GeoDataFrame I want to join them as 'features' in an output GeoDataFrame . 我想将它们作为“功能”加入输出GeoDataFrame

My datasets: 我的数据集:

Packages I'm using: 我正在使用的软件包:

import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely import geometry
import xarray as xr

I have a geodataframe storing lat/lon POINTS which corresponds to households. 我有一个存储纬度/经度点的地理数据框,该点对应于住户。 The index column is the id of the households. index列是家庭的ID。

geo_df.head()

Out[]:
  crop_name     xxx     cut_date plant_date                       geometry
0   SORGHUM  0.061029 2011-11-10 2011-11-10 POINT (37.89087631 14.35381619)
1    MILLET -0.104342 2011-10-19 2011-10-19 POINT (37.89087631 14.35381619)
2   SORGHUM -0.031697 2013-11-26 2013-11-26 POINT (37.89087631 14.35381619)

I have an xarray object storing GRIDDED vegetation health data (NDVI). 我有一个xarray对象,用于存储GRIDDED植被健康数据(NDVI)。

ndvi_df = xr.open_dataset(geo_data_dir+ndvi_dir).ndvi

Out[]: <xarray.DataArray 'ndvi' (time: 212, lat: 200, lon: 220)>
[9328000 values with dtype=float32]
Coordinates:
  * lon      (lon) float32 35.024994 35.074997 35.125 35.174988 35.22499 ...
  * lat      (lat) float32 14.974998 14.924995 14.875 14.824997 14.775002 ...
  * time     (time) datetime64[ns] 2000-02-14 2000-03-16 2000-04-15 ...
Attributes:
    long_name:   Normalized Difference Vegetation Index
    units:       1
    _fillvalue:  -3000

I have a geodataframe storing a POLYGON which corresponds to a country. 我有一个存储对应于一个国家的多边形的地理数据框。

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
ethiopia = world.loc[world["name"] == "Ethiopia"]

Visual Summary: 视觉摘要:

My datasets plotted on top of one another look as follows (plotted annually for demonstration purposes). 我的数据集相互重叠显示如下(出于演示目的,每年进行绘制)。

(ndvi_df.loc[f'{year}-01-16T00:00:00.000000000':f'{year}-12-16T00:00:00.000000000']
 .mean(dim='time')
 .plot(cmap='gist_earth_r', vmin=-0.1, vmax=1)
)

ax = plt.gca()

ethiopia.plot(alpha=0.2, color='black', ax=ax)

(geo_df
 .loc[ (lsms_geo_1["cut_date"] > f'{year}-01-01') & (lsms_geo_1["cut_date"] < f'{year+1}-01-01') ]
 .plot(markersize=6 ,ax=ax, color="#FEF731")
)
ax.set_title(f'{year} Mean NDVI and Households')
plt.show()

将家庭数据绘制在NDVI网格产品的顶部,并用埃塞俄比亚shapefile阴影。

Ideal Output: 理想输出:

I want as an output, a geodataframe with extra columns telling me the NDVI values in the PRECEDING MONTHS for the pixel which the households are inside. 我需要一个带有额外列的地理数据框作为输出,以告诉我在先行月份中住户所在像素的NDVI值。

The index column is the id of the households. index列是家庭的ID。

like this: 像这样:

  crop_name     xxx     cut_date plant_date                       geometry  ndvi_month_0  ndvi_month_1  ndvi_month_2
0   SORGHUM  0.061029 2011-11-10 2011-11-10 POINT (37.89087631 14.35381619)          0.3           0.3           0.3
1    MILLET -0.104342 2011-10-19 2011-10-19 POINT (37.89087631 14.35381619)          0.6           0.6           0.6
2   SORGHUM -0.031697 2013-11-26 2013-11-26 POINT (37.89087631 14.35381619)          0.1           0.1           0.1

I would also like to know how to subset my data in xarray object by using the geodataframe polygon ethiopia . 我还想知道如何通过使用geodataframe多边形ethiopia将我的数据子集到xarray对象中。

(reposted on GIS Stack Exchange here ) (在此处重新发布在GIS Stack Exchange上)

So, after help from @om_henners here there is a working solution to this question. 因此,在@om_henners的帮助下, 这里有一个可行的解决方案。

The following function can be applied to the geopandas.GeoDataFrame object. 以下函数可以应用于geopandas.GeoDataFrame对象。 It will select the preceding 12 months and select the NEAREST value for that lat,lon point in the GeoDataFrame . 这将选择前12个月,并选择该最接近的价值lat,lon的点GeoDataFrame

def geo_var_for_point(row, geovar_df, geovar_name):
    """
      Return a pandas series of geovariable values (NDVI or LST) which will be 
        indexed by the time index.

      Usage:
      -----
      `geo_df.apply(ndvi_for_point, axis=1, **{"geovar_df":ndvi_df})`

      Arguments:
      ---------
      :df (geopandas.GeoDataFrame) : dataframe with `geometry` and `cut_date` cols
      :geovar_df (xarray.DataArray): the geographic variable you want information from
      :geovar_name (str): how to label to columns with the correct geovariable

      Returns:
      -------
      :(pd.Series) : series object of geo_Var values for the 12 months prior to cut_date

      Variables:
      ---------
      :point (shapely.Point): geometry of the point (x, y coords)
      :cut_date (pd.datetime): the date at which the crop was cut
      :start_date (pd.datetime): the first month to select geovars from
    """
    # get the times
    cut_date = row['cut_date']
    start_date = cut_date - pd.DateOffset(months=12)

    # subset the geovar dataframe by time
    limited_geovar = geovar_df.loc[start_date: cut_date]

    # get the location
    point = row['geometry']

    # select the values from the xarray.DataArray for that location
    series = limited_geovar.sel(lat=point.y, lon=point.x, method='nearest').to_series()

    # create the output with columns labelled
    columns = [f"{geovar_name}_month_t-{i}" for i in np.arange(len(series))]
    return pd.Series(series.values , index=columns)

This function can be applied like so: 此功能可以像这样应用:

ndvi_extract = geo_df.head().apply(geo_var_for_point, axis=1, **{"geovar_df":ndvi_df, "geovar_name": "ndvi"})

Which returns: 哪个返回:

  ndvi_month_t-0  ndvi_month_t-1  ndvi_month_t-2  ndvi_month_t-3  ndvi_month_t-4  ndvi_month_t-5  ndvi_month_t-6  ndvi_month_t-7  ndvi_month_t-8  ndvi_month_t-9  ndvi_month_t-10 ndvi_month_t-11
0         0.3141          0.2559          0.2287          0.2056          0.1993          0.2015          0.1970          0.2187          0.2719          0.3669           0.4647          0.3563
1         0.3141          0.2559          0.2287          0.2056          0.1993          0.2015          0.1970          0.2187          0.2719          0.3669           0.4647          0.3563
2         0.2257          0.2065          0.1967          0.1949          0.1878          0.1861          0.1987          0.2801          0.4338          0.5667           0.4209          0.2880
3         0.2866          0.2257          0.2065          0.1967          0.1949          0.1878          0.1861          0.1987          0.2801          0.4338           0.5667          0.4209
4         0.4044          0.2866          0.2257          0.2065          0.1967          0.1949          0.1878          0.1861          0.1987          0.2801           0.4338          0.5667

Which can then be concatenated to the original dataframe: 然后可以将其连接到原始数据帧:

pd.concat([geo_df.head(), ndvi_extract.head()], axis=1)

This will return a geopandas.GeoDataFrame with the geovariable values for that point from the gridded product. 这将返回带有网格产品中该点的geovariable值的geopandas.GeoDataFrame。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM