简体   繁体   English

使用shapefile的python掩码netcdf数据

[英]python mask netcdf data using shapefile

I am using the following packages: 我使用以下包:

import pandas as pd
import numpy as np
import xarray as xr
import geopandas as gpd

I have the following objects storing data: 我有以下对象存储数据:

print(precip_da)

Out[]:
    <xarray.DataArray 'precip' (time: 13665, latitude: 200, longitude: 220)>
    [601260000 values with dtype=float32]
    Coordinates:
      * longitude  (longitude) float32 35.024994 35.074997 35.125 35.175003 ...
      * latitude   (latitude) float32 5.0249977 5.074997 5.125 5.174999 ...
      * time       (time) datetime64[ns] 1981-01-01 1981-01-02 1981-01-03 ...
    Attributes:
        standard_name:       convective precipitation rate
        long_name:           Climate Hazards group InfraRed Precipitation with St...
        units:               mm/day
        time_step:           day
        geostatial_lat_min:  -50.0
        geostatial_lat_max:  50.0
        geostatial_lon_min:  -180.0
        geostatial_lon_max:  180.0

This looks as follows: 这看起来如下:

precip_da.mean(dim="time").plot()

埃塞俄比亚东北部的平均降水量

I have my shapefile as a geopandas.GeoDataFrame which represents a polygon. 我有我的shapefile作为geopandas.GeoDataFrame ,它代表一个多边形。

awash = gpd.read_file(shp_dir)

awash
Out[]:
  OID_         Name      FolderPath  SymbolID  AltMode Base  Clamped Extruded  Snippet PopupInfo Shape_Leng  Shape_Area  geometry
0     0 Awash_Basin Awash_Basin.kml         0        0  0.0       -1        0     None      None  30.180944    9.411263  POLYGON Z ((41.78939511000004 11.5539922500000...

Which looks as follows: 其外观如下:

awash.plot()

区域shapefile存储为<code> geopandas.GeoDataFrame </ code>

Plotted one on top of the other they look like this: 在另一个上面绘制一个,它们看起来像这样:

ax = awash.plot(alpha=0.2, color='black')
precip_da.mean(dim="time").plot(ax=ax,zorder=-1)

Awash Region叠加在降水数据上

My question is, how do I mask the xarray.DataArray by checking if the lat-lon points lie INSIDE the shapefile stored as a geopandas.GeoDataFrame ? 我的问题是,我怎么掩盖xarray.DataArray通过检查LAT-LON点位于INSIDE存储为shape文件geopandas.GeoDataFrame

So I want ONLY the precipitation values (mm/day) which fall INSIDE that shapefile. 所以我只想要降低INSIDE那个shapefile的降水值(mm / day)。

I want to do something like the following: 我想做类似以下的事情:

masked_precip = precip_da.within(awash)

OR 要么

masked_precip = precip_da.loc[precip_da.isin(awash)]

EDIT 1 编辑1

I have thought about using the rasterio.mask module but I don't know what format the input data needs to be. 我曾考虑使用rasterio.mask 模块,但我不知道输入数据需要什么格式。 It sounds as if it does exactly the right thing: 听起来好像它做的正确:

" Creates a masked or filled array using input shapes. Pixels are masked or set to nodata outside the input shapes " 使用输入形状创建蒙版或填充数组。像素被屏蔽或设置为输入形状之外的节点数

Reposted from GIS Stack Exchange here 这里是从GIS Stack Exchange转发而来的

This is the current working solution that I have taken from this gist . 这是我从这个要点中获取的当前工作解决方案。 This is Stephan Hoyer's answer to a github issue for the xarray project. 这是Stephan Hoyer对xarray项目的github问题的回答。

On top of the other packages above both affine and rasterio are required 除了上面的其他包之外,还需要affinerasterio

from rasterio import features
from affine import Affine

def transform_from_latlon(lat, lon):
    """ input 1D array of lat / lon and output an Affine transformation
    """
    lat = np.asarray(lat)
    lon = np.asarray(lon)
    trans = Affine.translation(lon[0], lat[0])
    scale = Affine.scale(lon[1] - lon[0], lat[1] - lat[0])
    return trans * scale

def rasterize(shapes, coords, latitude='latitude', longitude='longitude',
              fill=np.nan, **kwargs):
    """Rasterize a list of (geometry, fill_value) tuples onto the given
    xray coordinates. This only works for 1d latitude and longitude
    arrays.

    usage:
    -----
    1. read shapefile to geopandas.GeoDataFrame
          `states = gpd.read_file(shp_dir+shp_file)`
    2. encode the different shapefiles that capture those lat-lons as different
        numbers i.e. 0.0, 1.0 ... and otherwise np.nan
          `shapes = (zip(states.geometry, range(len(states))))`
    3. Assign this to a new coord in your original xarray.DataArray
          `ds['states'] = rasterize(shapes, ds.coords, longitude='X', latitude='Y')`

    arguments:
    ---------
    : **kwargs (dict): passed to `rasterio.rasterize` function

    attrs:
    -----
    :transform (affine.Affine): how to translate from latlon to ...?
    :raster (numpy.ndarray): use rasterio.features.rasterize fill the values
      outside the .shp file with np.nan
    :spatial_coords (dict): dictionary of {"X":xr.DataArray, "Y":xr.DataArray()}
      with "X", "Y" as keys, and xr.DataArray as values

    returns:
    -------
    :(xr.DataArray): DataArray with `values` of nan for points outside shapefile
      and coords `Y` = latitude, 'X' = longitude.


    """
    transform = transform_from_latlon(coords[latitude], coords[longitude])
    out_shape = (len(coords[latitude]), len(coords[longitude]))
    raster = features.rasterize(shapes, out_shape=out_shape,
                                fill=fill, transform=transform,
                                dtype=float, **kwargs)
    spatial_coords = {latitude: coords[latitude], longitude: coords[longitude]}
    return xr.DataArray(raster, coords=spatial_coords, dims=(latitude, longitude))

def add_shape_coord_from_data_array(xr_da, shp_path, coord_name):
    """ Create a new coord for the xr_da indicating whether or not it 
         is inside the shapefile

        Creates a new coord - "coord_name" which will have integer values
         used to subset xr_da for plotting / analysis/

        Usage:
        -----
        precip_da = add_shape_coord_from_data_array(precip_da, "awash.shp", "awash")
        awash_da = precip_da.where(precip_da.awash==0, other=np.nan) 
    """
    # 1. read in shapefile
    shp_gpd = gpd.read_file(shp_path)

    # 2. create a list of tuples (shapely.geometry, id)
    #    this allows for many different polygons within a .shp file (e.g. States of US)
    shapes = [(shape, n) for n, shape in enumerate(shp_gpd.geometry)]

    # 3. create a new coord in the xr_da which will be set to the id in `shapes`
    xr_da[coord_name] = rasterize(shapes, xr_da.coords, 
                               longitude='longitude', latitude='latitude')

    return xr_da

It can be implemented as follows: 它可以实现如下:

precip_da = add_shape_coord_from_data_array(precip_da, shp_dir, "awash")
awash_da = precip_da.where(precip_da.awash==0, other=np.nan)
awash_da.mean(dim="time").plot()

埃塞俄比亚Awash盆地的平均降雨量

You should have a look at the following packages: 您应该看看以下包:

Both may get you to what you want. 两者都可以让你达到你想要的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 shapefile 屏蔽 NetCDF 并计算 shapefile 中所有多边形的平均值和异常值 - mask NetCDF using shapefile and calculate average and anomaly for all polygons within the shapefile 使用 python 从 netCDF 文件中提取数据 - Extracting data from netCDF file using python 与gdal,ogr等python中的shapefile的GTiff掩码 - GTiff mask with shapefile in python with gdal, ogr, etc 如何根据shapefile屏蔽特定的数组数据 - how to mask the specific array data based on the shapefile 从包含在 shapefile 边界内的 netcdf 文件中提取数据 - extract data from netcdf file contained within a shapefile's boundaries 通过 shapefile 剪切 NetCDF 文件 - Cut NetCDF files by shapefile 如何使用 rasterio/python 使用 shapefile 屏蔽栅格,将多边形内的栅格像素设置为零? - How do I use rasterio/python to mask a raster using a shapefile, to set the raster pixels inside the polygons to zero? 使用 Python 将 Geojson 转换为 shapefile - Geojson to shapefile using Python 使用空间插值 python 填充 shapefile 中的缺失数据 - Filling Missing data in shapefile using spatial interpolation python 使用 python 从 netCDF 中提取特定的所需数据 - Extract specific required data from netCDF using python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM