简体   繁体   English

访问边界多边形内的 NetCDF 值

[英]Accessing NetCDF values within a bounding polygon

I'm trying to access daily temperature values from a NetCDF for analysis but want to create summaries of temps (ie total number of days within a temperature range) within different administrative units.我正在尝试从 NetCDF 访问每日温度值以进行分析,但想在不同的行政单位内创建临时工摘要(即温度范围内的总天数)。 I have a global nc file and a shapefile with the admin units as well.我还有一个全局 nc 文件和一个带有管理单元的 shapefile。

My plan is to read through the temp data by looping through the lat, lon, and time (the three temp parameters) and save the desired data to a list, but am having trouble conceptualizing how to limit my count to only the pixels in a specific polygon.我的计划是通过循环访问纬度、经度和时间(三个临时参数)来读取临时数据,并将所需数据保存到列表中,但是我无法概念化如何将我的计数限制为仅特定的多边形。

Since I'm working with a state that has a lot of administrative units, some of which are fairly small, it would not be ideal for me to use a bounding box rather than the exact shape of the polygon.由于我使用的 state 有很多管理单元,其中一些相当小,因此使用边界框而不是多边形的确切形状对我来说不是理想的选择。 All I need to do is be able to loop through the pixels within, write them out to somewhere else, and move on to the next unit.我需要做的就是能够遍历其中的像素,将它们写到其他地方,然后继续到下一个单元。

Does anyone have any suggestions on how I can loop through the polygons and read just the pixels within each one?有没有人对我如何遍历多边形并只读取每个多边形中的像素有任何建议?

I've never worked with NetCDFs before so I'm not really sure where to start.我以前从未使用过 NetCDF,所以我不确定从哪里开始。 I'm able to access the data itself fine but am stuck on how to overlay these admin units.我能够很好地访问数据本身,但仍然无法解决如何覆盖这些管理单元的问题。

There are several algorithms for checking whether a point is inside a polygon, see eg: https://en.wikipedia.org/wiki/Point_in_polygon .有几种算法可用于检查点是否在多边形内,请参见例如: https://en.wikipedia.org/wiki/Point_in_polygon You can use that to generate a mask, which you can then use to calculate statistics over only the masked grid ponts.您可以使用它来生成一个掩码,然后您可以使用它来计算仅对掩码网格桥的统计数据。

Using some functions that I wrote before, I quickly coded together this example.使用我之前编写的一些函数,我快速编写了这个示例。 Without Numba (the @jit 's which are now commented out) this works fine for small datasets, but it becomes very slow for large ones.如果没有 Numba( @jit现在已被注释掉),这对于小型数据集来说效果很好,但对于大型数据集来说会变得非常慢。 With Numba, I get these timings for a 1024 x 1024 grid point dataset:使用 Numba,我得到了 1024 x 1024 网格点数据集的这些时间:

In [42]: timeit get_mask(mask, lons, lats, poly_x, poly_y)
37.7 ms ± 79.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Which to me seems fine.这对我来说似乎很好。

import numpy as np
#from numba import jit

#@jit(nopython=True, nogil=True)
def is_left(xp, yp, x0, y0, x1, y1):
    """
    Check whether point (xp,yp) is left of line segment ((x0,y0) to (x1,y1))
    returns:  >0 if left of line, 0 if on line, <0 if right of line
    """

    return (x1-x0) * (yp-y0) - (xp-x0) * (y1-y0)

#@jit(nopython=True, nogil=True)
def distance(x1, y1, x2, y2):
    """
    Calculate Euclidean distance.
    """
    return ((x1-x2)**2 + (y1-y2)**2)**0.5

#@jit(nopython=True, nogil=True)
def point_is_on_line(x, y, x1, y1, x2, y2):
    """
    Check whether point it exactly on line
    """

    d1 = distance(x,  y,  x1, y1)
    d2 = distance(x,  y,  x2, y2)
    d3 = distance(x1, y1, x2, y2)

    eps = 1e-12
    return np.abs((d1+d2)-d3) < eps

#@jit(nopython=True, nogil=True)
def is_inside(xp, yp, x_set, y_set, size):
    """
    Given location (xp,yp) and set of line segments (x_set, y_set), determine
    whether (xp,yp) is inside (or on) polygon.
    """

    # First simple check on bounds
    if (xp < x_set.min() or xp > x_set.max() or yp < y_set.min() or yp > y_set.max()):
        return False

    wn = 0
    for i in range(size-1):

        # Second check: see if point exactly on line segment:
        if point_is_on_line(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]):
            return False

        #if (is_left(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]) == 0):
        #    return False

        # Calculate winding number
        if (y_set[i] <= yp):
            if (y_set[i+1] > yp):
                if (is_left(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]) > 0):
                    wn += 1
        else:
            if (y_set[i+1] <= yp):
                if (is_left(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]) < 0):
                    wn -= 1

    if (wn == 0):
        return False
    else:
        return True

#@jit(nopython=True, nogil=True)
def get_mask(mask, x, y, poly_x, poly_y):
    """
    Generate mask for grid points inside polygon
    """

    for j in range(y.size):
        for i in range(x.size):
            if is_inside(lons[i], lats[j], poly_x, poly_y, poly_x.size):
                mask[j,i] = True

    return mask


if __name__ == '__main__':
    import matplotlib.pyplot as pl
    pl.close('all')

    # Dummy data.
    lats = np.linspace(0, 20, 16)
    lons = np.linspace(0, 16, 16)
    
    data = np.arange(lats.size*lons.size).reshape((lats.size, lons.size))
    
    # Bounding box.
    poly_x = np.array([2, 12, 9, 6, 2])
    poly_y = np.array([4, 8.5, 15, 14, 4])
    
    # Generate mask for calculating statistics.
    mask = np.zeros_like(data, dtype=bool)
    get_mask(mask, lons, lats, poly_x, poly_y)
    
    # Calculate statistics.
    max_val = data[mask].max()
    
    # Plot data and mask.
    pl.figure(figsize=(10,4))
    pl.subplot(121)
    pl.title('data')
    pl.pcolormesh(lons, lats, data)
    pl.plot(poly_x, poly_y)
    pl.colorbar()
    
    pl.subplot(122)
    pl.title('averaging mask, max_value={}'.format(max_val))
    pl.pcolormesh(lons, lats, mask)
    pl.plot(poly_x, poly_y)
    pl.colorbar()
    
    pl.tight_layout()

在此处输入图像描述

I went little bit further with the answer by @Bart.我对@Bart 的回答更进一步。

You are working with 3D data - time dimension and then latitude and longitude.您正在处理 3D 数据 - 时间维度,然后是纬度和经度。

If one needs to calculate statistics in different regions, I would suggest making a 2D map with some integer like values for different regions with the same spatial size as the original input data.如果需要计算不同区域的统计数据,我建议制作一个 2D map 和一些 integer 类似的值,用于具有与原始输入数据相同空间大小的不同区域。

Here is the updated code based on @Bart's initial solution to find the number occurrences of means of 2 different regions in some predefined range:这是基于@Bart 的初始解决方案的更新代码,用于查找某个预定义范围内 2 个不同区域的均值出现次数:

import numpy as np
#from numba import jit

#@jit(nopython=True, nogil=True)
def is_left(xp, yp, x0, y0, x1, y1):
    """
    Check whether point (xp,yp) is left of line segment ((x0,y0) to (x1,y1))
    returns:  >0 if left of line, 0 if on line, <0 if right of line
    """

    return (x1-x0) * (yp-y0) - (xp-x0) * (y1-y0)

#@jit(nopython=True, nogil=True)
def distance(x1, y1, x2, y2):
    """
    Calculate Euclidean distance.
    """
    return ((x1-x2)**2 + (y1-y2)**2)**0.5

#@jit(nopython=True, nogil=True)
def point_is_on_line(x, y, x1, y1, x2, y2):
    """
    Check whether point it exactly on line
    """

    d1 = distance(x,  y,  x1, y1)
    d2 = distance(x,  y,  x2, y2)
    d3 = distance(x1, y1, x2, y2)

    eps = 1e-12
    return np.abs((d1+d2)-d3) < eps

#@jit(nopython=True, nogil=True)
def is_inside(xp, yp, x_set, y_set, size):
    """
    Given location (xp,yp) and set of line segments (x_set, y_set), determine
    whether (xp,yp) is inside (or on) polygon.
    """

    # First simple check on bounds
    if (xp < x_set.min() or xp > x_set.max() or yp < y_set.min() or yp > y_set.max()):
        return False

    wn = 0
    for i in range(size-1):

        # Second check: see if point exactly on line segment:
        if point_is_on_line(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]):
            return False

        #if (is_left(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]) == 0):
        #    return False

        # Calculate winding number
        if (y_set[i] <= yp):
            if (y_set[i+1] > yp):
                if (is_left(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]) > 0):
                    wn += 1
        else:
            if (y_set[i+1] <= yp):
                if (is_left(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]) < 0):
                    wn -= 1

    if (wn == 0):
        return False
    else:
        return True

#@jit(nopython=True, nogil=True)
def get_mask(mask, x, y, poly_x, poly_y):
    """
    Generate mask for grid points inside polygon
    """

    for j in range(y.size):
        for i in range(x.size):
            if is_inside(lons[i], lats[j], poly_x, poly_y, poly_x.size):
                mask[j,i] = True

    return mask


if __name__ == '__main__':
    import pandas as pd
    import matplotlib.pyplot as plt
    plt.close('all')
    # --------------------------------------------------------------------------------------------------------------
    # Dummy data.
    lats = np.linspace(0, 20, 16)
    lons = np.linspace(0, 16, 16)
    time = np.linspace(0,365,365)
    # --------------------------------------------------------------------------------------------------------------
    # let us make some random yearly data of temperature:
    np.random.seed(9)
    data = 0+100*(np.random.random((time.size,lats.size,lons.size))-0.5) # this is really random data
    temprange = [10,20]
    # --------------------------------------------------------------------------------------------------------------
    # let us have several areas:
    # Bounding box.
    poly_x = np.array([2, 12, 9, 6, 2])
    poly_y = np.array([4, 8.5, 15, 14, 4])
    # ---------------------------------------------------------------------------------------------------------------
    # Define areas for calculating statistics, I will use values 1 and 2 for in polygon and outside of it, one define a lot of different:
    regions = np.zeros((lats.size,lons.size))
    mask = np.zeros((lats.size,lons.size), dtype=bool)
    get_mask(mask, lons, lats, poly_x, poly_y)
    regions[mask==True] = 1
    regions[mask==False] = 2
    # ---------------------------------------------------------------------------------------------------------------
    # our "complicated" region map:
    fig = plt.figure(figsize=(10,10));ax = fig.add_subplot(111);
    p0 = ax.pcolormesh(lons,lats,regions);plt.colorbar(p0);
    ax.set_title('Different regions')
    plt.show()
    # ---------------------------------------------------------------------------------------------------------------
    # Let us find the number of days within some range:
    statsout = {}
    for regval in np.unique(regions):
        regmeantemp = np.average(data[:,regions==regval],axis=(1))  # this is mean serie for the polygon
        # -----------------------------------------------------------------------------------------------------------
        fig = plt.figure(figsize=(10,10));ax = fig.add_subplot(111)
        p0 = ax.plot(time,regmeantemp);
        ax.set_xlabel('Time');ax.set_ylabel('Mean temperature')
        ax.set_title('Mean inside basin set to value '+str(regval));
        plt.show()
        # -----------------------------------------------------------------------------------------------------------
        # let us collect the occurrences of temperature in some pre-defined range:
        statsout[regval] = np.sum((regmeantemp > temprange[0]) & (regmeantemp < temprange[1]))
    # ----------------------------------------------------------------------------------------------------------------
    print(statsout)

Hope this helps!希望这可以帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM