[英]python Point-In-Polygon operation. Join gridded data with point data, based on those points falling inside the grid
DataArray
based on the location ( geo_df.geometry
) and time ( geo_df.plant_date
& geo_df.cut_date
) of rows in the geopandas GeoDataFrame
. 我想知道如何从xarray选择值DataArray
基于位置( geo_df.geometry
)和时间( geo_df.plant_date
& geo_df.cut_date
行)在geopandas GeoDataFrame
。 I want to join them as 'features' in an output GeoDataFrame
. 我想将它们作为“功能”加入输出GeoDataFrame
。 Packages I'm using: 我正在使用的软件包:
import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely import geometry
import xarray as xr
I have a geodataframe storing lat/lon POINTS which corresponds to households. 我有一个存储纬度/经度点的地理数据框,该点对应于住户。 The index
column is the id of the households. index
列是家庭的ID。
geo_df.head()
Out[]:
crop_name xxx cut_date plant_date geometry
0 SORGHUM 0.061029 2011-11-10 2011-11-10 POINT (37.89087631 14.35381619)
1 MILLET -0.104342 2011-10-19 2011-10-19 POINT (37.89087631 14.35381619)
2 SORGHUM -0.031697 2013-11-26 2013-11-26 POINT (37.89087631 14.35381619)
I have an xarray object storing GRIDDED vegetation health data (NDVI). 我有一个xarray对象,用于存储GRIDDED植被健康数据(NDVI)。
ndvi_df = xr.open_dataset(geo_data_dir+ndvi_dir).ndvi
Out[]: <xarray.DataArray 'ndvi' (time: 212, lat: 200, lon: 220)>
[9328000 values with dtype=float32]
Coordinates:
* lon (lon) float32 35.024994 35.074997 35.125 35.174988 35.22499 ...
* lat (lat) float32 14.974998 14.924995 14.875 14.824997 14.775002 ...
* time (time) datetime64[ns] 2000-02-14 2000-03-16 2000-04-15 ...
Attributes:
long_name: Normalized Difference Vegetation Index
units: 1
_fillvalue: -3000
I have a geodataframe storing a POLYGON which corresponds to a country. 我有一个存储对应于一个国家的多边形的地理数据框。
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
ethiopia = world.loc[world["name"] == "Ethiopia"]
My datasets plotted on top of one another look as follows (plotted annually for demonstration purposes). 我的数据集相互重叠显示如下(出于演示目的,每年进行绘制)。
(ndvi_df.loc[f'{year}-01-16T00:00:00.000000000':f'{year}-12-16T00:00:00.000000000']
.mean(dim='time')
.plot(cmap='gist_earth_r', vmin=-0.1, vmax=1)
)
ax = plt.gca()
ethiopia.plot(alpha=0.2, color='black', ax=ax)
(geo_df
.loc[ (lsms_geo_1["cut_date"] > f'{year}-01-01') & (lsms_geo_1["cut_date"] < f'{year+1}-01-01') ]
.plot(markersize=6 ,ax=ax, color="#FEF731")
)
ax.set_title(f'{year} Mean NDVI and Households')
plt.show()
I want as an output, a geodataframe with extra columns telling me the NDVI values in the PRECEDING MONTHS for the pixel which the households are inside. 我需要一个带有额外列的地理数据框作为输出,以告诉我在先行月份中住户所在像素的NDVI值。
The index
column is the id of the households. index
列是家庭的ID。
like this: 像这样:
crop_name xxx cut_date plant_date geometry ndvi_month_0 ndvi_month_1 ndvi_month_2
0 SORGHUM 0.061029 2011-11-10 2011-11-10 POINT (37.89087631 14.35381619) 0.3 0.3 0.3
1 MILLET -0.104342 2011-10-19 2011-10-19 POINT (37.89087631 14.35381619) 0.6 0.6 0.6
2 SORGHUM -0.031697 2013-11-26 2013-11-26 POINT (37.89087631 14.35381619) 0.1 0.1 0.1
I would also like to know how to subset my data in xarray object by using the geodataframe polygon ethiopia
. 我还想知道如何通过使用geodataframe多边形ethiopia
将我的数据子集到xarray对象中。
(reposted on GIS Stack Exchange here ) (在此处重新发布在GIS Stack Exchange上)
So, after help from @om_henners here there is a working solution to this question. 因此,在@om_henners的帮助下, 这里有一个可行的解决方案。
The following function can be applied to the geopandas.GeoDataFrame
object. 以下函数可以应用于geopandas.GeoDataFrame
对象。 It will select the preceding 12 months and select the NEAREST value for that lat,lon
point in the GeoDataFrame
. 这将选择前12个月,并选择该最接近的价值lat,lon
的点GeoDataFrame
。
def geo_var_for_point(row, geovar_df, geovar_name):
"""
Return a pandas series of geovariable values (NDVI or LST) which will be
indexed by the time index.
Usage:
-----
`geo_df.apply(ndvi_for_point, axis=1, **{"geovar_df":ndvi_df})`
Arguments:
---------
:df (geopandas.GeoDataFrame) : dataframe with `geometry` and `cut_date` cols
:geovar_df (xarray.DataArray): the geographic variable you want information from
:geovar_name (str): how to label to columns with the correct geovariable
Returns:
-------
:(pd.Series) : series object of geo_Var values for the 12 months prior to cut_date
Variables:
---------
:point (shapely.Point): geometry of the point (x, y coords)
:cut_date (pd.datetime): the date at which the crop was cut
:start_date (pd.datetime): the first month to select geovars from
"""
# get the times
cut_date = row['cut_date']
start_date = cut_date - pd.DateOffset(months=12)
# subset the geovar dataframe by time
limited_geovar = geovar_df.loc[start_date: cut_date]
# get the location
point = row['geometry']
# select the values from the xarray.DataArray for that location
series = limited_geovar.sel(lat=point.y, lon=point.x, method='nearest').to_series()
# create the output with columns labelled
columns = [f"{geovar_name}_month_t-{i}" for i in np.arange(len(series))]
return pd.Series(series.values , index=columns)
This function can be applied like so: 此功能可以像这样应用:
ndvi_extract = geo_df.head().apply(geo_var_for_point, axis=1, **{"geovar_df":ndvi_df, "geovar_name": "ndvi"})
Which returns: 哪个返回:
ndvi_month_t-0 ndvi_month_t-1 ndvi_month_t-2 ndvi_month_t-3 ndvi_month_t-4 ndvi_month_t-5 ndvi_month_t-6 ndvi_month_t-7 ndvi_month_t-8 ndvi_month_t-9 ndvi_month_t-10 ndvi_month_t-11
0 0.3141 0.2559 0.2287 0.2056 0.1993 0.2015 0.1970 0.2187 0.2719 0.3669 0.4647 0.3563
1 0.3141 0.2559 0.2287 0.2056 0.1993 0.2015 0.1970 0.2187 0.2719 0.3669 0.4647 0.3563
2 0.2257 0.2065 0.1967 0.1949 0.1878 0.1861 0.1987 0.2801 0.4338 0.5667 0.4209 0.2880
3 0.2866 0.2257 0.2065 0.1967 0.1949 0.1878 0.1861 0.1987 0.2801 0.4338 0.5667 0.4209
4 0.4044 0.2866 0.2257 0.2065 0.1967 0.1949 0.1878 0.1861 0.1987 0.2801 0.4338 0.5667
Which can then be concatenated to the original dataframe: 然后可以将其连接到原始数据帧:
pd.concat([geo_df.head(), ndvi_extract.head()], axis=1)
This will return a geopandas.GeoDataFrame with the geovariable values for that point from the gridded product. 这将返回带有网格产品中该点的geovariable值的geopandas.GeoDataFrame。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.