简体   繁体   English

python获得最大值xarray的月份

[英]python get month of maximum value xarray

How to get the month of maximum runoff 如何获得最大径流月份

I want to get the month of maximum runoff for each year, and for the time series as a whole. 我想得到每年最大径流量的月份,以及整个时间序列。 The idea is to characterise global seasonality by looking at the month of max runoff. 这个想法是通过观察最大径流月份来描述全球季节性。 I then want to try and consider whether each pixel has a unimodal or bimodal regime. 然后我想尝试考虑每个像素是否具有单峰或双峰方式。

I want to create a map like the one in the Pangeo Examples here . 我想创建一个地图就像一个在Pangeo例子在这里

示例图片

What this shows is the hour of maximum precipitation. 这显示的是最大降水量的小时。 I want to show the MONTH of maximum runoff (as an integer). 我想显示最大径流的MONTH(作为整数)。

Getting the data 获取数据

Here I download the GRUN runoff data and create an xarray object. 在这里,我下载GRUN径流数据并创建一个xarray对象。 NOTE: The dataset here is >1GB. 注意:此处的数据集> 1GB。 I am using it to make this example entirely reproducible. 我用它来使这个例子完全可以重现。

# get the data
import subprocess
command = """
wget -O grun.nc https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/324386/GRUN_v1_GSWP3_WGS84_05_1902_2014.nc?sequence=1&isAllowed=y
"""
import os
if not os.path.exists('grun.nc'):
  process = subprocess.Popen(command.split(), stdout=subprocess.PIPE)
  output, error = process.communicate()

# read the data
import xarray as xr
ds = xr.open_dataset('grun.nc')

# select a subset so we can work with it more quickly
ds = ds.isel(time=slice(-100,-1))
ds

Out[]:
<xarray.Dataset>
Dimensions:  (lat: 360, lon: 720, time: 99)
Coordinates:
  * lon      (lon) float64 -179.8 -179.2 -178.8 -178.2 ... 178.8 179.2 179.8
  * lat      (lat) float64 -89.75 -89.25 -88.75 -88.25 ... 88.75 89.25 89.75
  * time     (time) datetime64[ns] 2006-09-01 2006-10-01 ... 2014-11-01
Data variables:
    Runoff   (time, lat, lon) float32 ...
Attributes:
    title:                   GRUN
    version:                 GRUN 1.0
    meteorological_forcing:  GSWP3
    temporal_resolution:     monthly
    spatial_resolution:      0.5x0.5
    crs:                     WGS84
    proj4:                   +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs
    EPSG:                    4326
    references:              Ghiggi et al.,2019. GRUN: An observation-based g...
    authors:                 Gionata Ghiggi; Lukas Gudmundsson
    contacts:                gionata.ghiggi@gmail.com; lukas.gudmundsson@env....
    institution:             Land-Climate Dynamics, Institute for Atmospheric...
    institution_id:          IAC ETHZ

What I have tried 我试过了什么

I have nan values so I can't just apply an argmax() to the dataset. 我有nan值,所以我不能只将argmax()应用于数据集。 I use the same approach as @jhamman here combined with the Pangeo Examples above. 我用同样的方式为@jhamman 这里与上面的例子Pangeo结合。 I'm not entirely sure what this is giving me but it seems to be giving me 我不完全确定这给了我什么,但它似乎在给我

# Apply argmax where you have NAN values
def my_func(ds, dim=None):
    return ds.isel(**{dim: ds['Runoff'].argmax(dim)})

mask = ds['Runoff'].isel(time=0).notnull()  # determine where you have valid data
ds2 = ds.fillna(-9999)  # fill nans with a missing flag of some kind
new = ds2.reset_coords(drop=True).groupby('time.month').apply(my_func, dim='time').where(mask)  # do the groupby operation/reduction and reapply the mask
new

Out[]:
<xarray.Dataset>
Dimensions:  (lat: 360, lon: 720, month: 12)
Coordinates:
  * lon      (lon) float64 -179.8 -179.2 -178.8 -178.2 ... 178.8 179.2 179.8
  * lat      (lat) float64 -89.75 -89.25 -88.75 -88.25 ... 88.75 89.25 89.75
  * month    (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
Data variables:
    Runoff   (month, lat, lon) float32 nan nan nan nan nan ... nan nan nan nan
Attributes:
    title:                   GRUN
    version:                 GRUN 1.0
    meteorological_forcing:  GSWP3
    temporal_resolution:     monthly
    spatial_resolution:      0.5x0.5
    crs:                     WGS84
    proj4:                   +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs
    EPSG:                    4326
    references:              Ghiggi et al.,2019. GRUN: An observation-based g...
    authors:                 Gionata Ghiggi; Lukas Gudmundsson
    contacts:                gionata.ghiggi@gmail.com; lukas.gudmundsson@env....
    institution:             Land-Climate Dynamics, Institute for Atmospheric...
    institution_id:          IAC ETHZ

This gives me 这给了我

import matplotlib.pyplot as plt
fig,ax = plt.subplots(figsize=(12,8))
new.Runoff.sel(month=10).plot(ax=ax,  cmap='twilight')

我目前的输出

Ideal Output 理想输出

What I want is for the value of each Pixel to be the month of maximum Runoff. 我想要的是每个Pixel的值是最大径流的月份。

Happy to convert to pandas if necessary. 如有必要,很高兴转换为pandas

So I would end up with a xr.Dataset with the integer for the month of maximum runoff. 因此,我最终会得到一个xr.Dataset,其中包含最大径流月份的整数。 Ideally, it would be great to also have the month of maximum runoff over time so I can also see the way that this seasonality has changed. 理想情况下,随着时间的推移,最大的径流月份也会很棒,所以我也可以看到这个季节变化的方式。

<xarray.Dataset>
Dimensions:  (lat: 360, lon: 720)
Coordinates:
  * lon      (lon) float64 -179.8 -179.2 -178.8 -178.2 ... 178.8 179.2 179.8
  * lat      (lat) float64 -89.75 -89.25 -88.75 -88.25 ... 88.75 89.25 89.75
Data variables:
    Month_of_max (lat, lon) int32 ...

# OR EVEN BETTER
<xarray.Dataset>
Dimensions:  (lat: 360, lon: 720, Year: 10)
Coordinates:
  * lon      (lon) float64 -179.8 -179.2 -178.8 -178.2 ... 178.8 179.2 179.8
  * lat      (lat) float64 -89.75 -89.25 -88.75 -88.25 ... 88.75 89.25 89.75
  * year     (year) float64 2010 2011 2012 2013 ... 
Data variables:
    Month_of_max (lat, lon, year) int32 ...

So the best solution I found was to convert to a pandas.Dataframe object and then do the calculations there. 所以我找到的最佳解决方案是转换为pandas.Dataframe对象,然后在那里进行计算。 I have wrapped the solution into the functions below. 我已将解决方案包含在下面的函数中。

First let's work with a subset of the data (it takes ages otherwise). 首先让我们使用一部分数据(否则需要很长时间)。 This is a box around Kenya. 这是肯尼亚周围的一个盒子。

import xarray as xr
ds = xr.open_dataset('grun.nc')
ds = ds.isel(time=slice(-20,-1))
ds = ds.sel(lat=slice(-5.202,6.002),lon=slice(33.501,42.283))

ds.attrs = ''
ds


Out[]:
<xarray.Dataset>
Dimensions:  (lat: 22, lon: 18, time: 19)
Coordinates:
  * lon      (lon) float64 33.75 34.25 34.75 35.25 ... 40.75 41.25 41.75 42.25
  * lat      (lat) float64 -4.75 -4.25 -3.75 -3.25 -2.75 ... 4.25 4.75 5.25 5.75
  * time     (time) datetime64[ns] 2013-05-01 2013-06-01 ... 2014-11-01
Data variables:
    Runoff   (time, lat, lon) float32 ...

The work is all done and tied together in: calculate_annual_month_of_max() . 这项工作全部完成并捆绑在一起: calculate_annual_month_of_max() Basically what it does is convert the xr.Dataset to a pd.Dataframe object then it extracts the timestep of maximum Runoff for each lat-lon-year . 基本上它的作用是将xr.Dataset转换为pd.Dataframe对象,然后它为每个lat-lon-year提取最大Runoff的时间步长。 The beauty of this approach is that it returns both the Runoff value and the month integer. 这种方法的Runoff在于它返回了Runoff值和month整数。

import pandas as pd

def convert_to_df(ds):
    """
    Returns:
    -------
    xr.Dataset
    """
    df = ds.to_dataframe()
    df.reset_index(inplace=True)
    return df


def calculate_year_month_cols(df):
    """"""
    assert 'time' in df.columns, f"time should be in df.columns. Currently: {[c for c in df.columns]}"
    df['year'] = df.time.map(lambda x: x.year)
    df['month'] = df.time.map(lambda x: x.month)

    return df


def calculate_month_of_max_value(df, value_col):
    """
    Arguments
    ---------
    df : pd.DataFrame
        dataframe converted from xarray with ['lat','lon', 'year', value_col] columns

    value_col : str
        column that you want to find the month of maximum for 
        e.g. Which month (int) in each pixel (lat,lon) has the highest runoff
    """
    max_months = df.loc[df.groupby(["lat","lon","year"])[value_col].idxmax()]
    return max_months


def convert_dataframe_to_xarray(df, index_cols=['lat','lon']):
    """
    Arguments
    ---------
    df: pd.DataFrame
        the dataframe to convert to xr.dataset

    index_cols: List[str]
        the columns that will become the coordinates 
        of the output xr.Dataset

    Returns
    -------
    xr.Dataset
    """
    out = df.set_index(index_cols).dropna()
    ds = out.to_xarray()
    return ds


def calculate_annual_month_of_max(ds, variable):
    """for the `variable` in the `ds` calculate the 
    month of maximum for a given pixel-year.

    Returns:
    -------
    xr.Dataset
    """
    # convert to a dataframe
    df = convert_to_df(ds)
    df = calculate_year_month_cols(df)
    # calculate the month of maximum
    df = calculate_month_of_max_value(df, value_col=variable)
    # reconstitute the dataframe object
    ds_out = convert_dataframe_to_xarray(df, index_cols=['lat','lon','year'])

    return ds_out


mon_of_max = calculate_annual_month_of_max(ds, variable='Runoff')
mon_of_max


Out[]:
<xarray.Dataset>
Dimensions:  (lat: 22, lon: 18, year: 2)
Coordinates:
  * lat      (lat) float64 -4.75 -4.25 -3.75 -3.25 -2.75 ... 4.25 4.75 5.25 5.75
  * lon      (lon) float64 33.75 34.25 34.75 35.25 ... 40.75 41.25 41.75 42.25
  * year     (year) float64 2.013e+03 2.014e+03
Data variables:
    time     (lat, lon, year) datetime64[ns] 2013-12-01 ... 2014-10-01
    Runoff   (lat, lon, year) float32 0.5894838 0.9081207 ... 0.2789653
    month    (lat, lon, year) float64 12.0 1.0 12.0 1.0 ... 11.0 10.0 11.0 10.0

Which looks like: 看起来像: 中午最大径流量中位数(2013-2014)

I have nan values so I can't just apply an argmax() to the dataset. 我有nan值,所以我不能只将argmax()应用于数据集。

Indeed. 确实。

Consider using .fillna(0) before applying argmax. 在应用argmax之前,请考虑使用.fillna(0) (Or perhaps .dropna() .) (或者也许.dropna() 。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM