简体   繁体   English

xarray:将数据变量与新的连续维度中的离散观察值相结合

[英]xarray: Combine data variables with discrete observations in a new continuous dimension

I am working with a crop calendar that records the day of the year (doy) at which a given phenological state occurs - here the mean planting ( plant ) and harvest ( harvest ) seasons (note that the nan printed below are pixels on oceans, the other values contain int ):我正在使用一个作物日历,它记录了给定物候 state 发生的一年中的哪一天(doy) - 这里是平均种植( plant )和收获( harvest )季节(请注意,下面打印的nan是海洋上的像素,其他值包含int ):

<xarray.Dataset>
Dimensions:  (y: 2160, x: 4320)
Coordinates:
  * x        (x) float64 -180.0 -179.9 -179.8 -179.7 ... 179.7 179.8 179.9 180.0
  * y        (y) float64 89.96 89.88 89.79 89.71 ... -89.71 -89.79 -89.88 -89.96
Data variables:
    plant    (y, x) float32 nan nan nan nan nan nan ... nan nan nan nan nan nan
    harvest  (y, x) float32 nan nan nan nan nan nan ... nan nan nan nan nan nan

I need to combine the two variables in a dataarray of dimension (doy: 365, y: 2160, x: 4320) in order to track, for each pixel, the phenological state as a function of the doy.我需要将这两个变量组合在维度数据数组中 (doy: 365, y: 2160, x: 4320) 以便跟踪每个像素的物候 state 作为 doy 的 function。 Conceptually, the steps I identified so far are:从概念上讲,到目前为止我确定的步骤是:

  1. assigne a numerical value for each state, eg, off=0 , plant=1 , harvest=2为每个 state 分配一个数值,例如off=0 , plant=1 , harvest=2
  2. use the doy as an index to the corresponding day in the doy dimension of the new dataarray and assign the numerical value corresponding to the state以doy作为新dataarray的doy维度中对应日期的索引,赋值state对应的数值
  3. complete the values in between using something similar to pandas.DataFrame.fillna with method='ffill'使用类似于pandas.DataFrame.fillnamethod='ffill'来完成两者之间的值

I went through the Reshaping and reorganizing data and the Combining Data pages, but with my current understanding of xarray I honestly don't know where to start.我浏览了重塑和重组数据以及组合数据页面,但根据我目前对 xarray 的理解,我真的不知道从哪里开始。

Can anyone point me in a direction?谁能指出我的方向? Is what I am trying to do even achievable using only matrix operations or do I have to introduce loops?我想做的事情是仅使用矩阵运算就可以实现,还是我必须引入循环?

PS: Apologies for the confusing formulation of the question itself. PS:为问题本身的混乱表述道歉。 I guess that only reflects something fundamental that I am still missing.我想这只反映了我仍然缺少的一些基本知识。

You can exploit xarray's automatic broadcasting rules to create a boolean mask of all dates above/below an array of dates indexed by x/y:您可以利用 xarray 的自动广播规则来创建一个 boolean 掩码,其中所有日期高于/低于由 x/y 索引的日期数组:

# I'm assuming your "day of year" values are 1-indexed, and you're
# using a 365-day calendar. I'll leave leap year handling to you :)
days_of_year = xr.DataArray(
    np.arange(1, 366), dims=["day_of_year"], coords=[np.arange(1, 366)],
)

# broadcast against eachother with (x, y) <= (day_of_year, )
planted = ds.plant <= days_of_year
harvested = ds.harvest <= days_of_year

state = planted + harvested

The result will be an array with dimensions (x, y, day_of_year) containing the codes you described.结果将是一个维度为(x, y, day_of_year)的数组,其中包含您描述的代码。

A note on growing season data:关于生长期数据的说明:

I've worked with this type of data before, and one thing to watch out for is areas globally where the growing season spans Jan 1, which would break your above method.我以前处理过此类数据,需要注意的一件事是全球范围内生长季节跨越 1 月 1 日的地区,这会破坏您的上述方法。 Alternatively, you could just create a "is_growing_season" mask which flexibly handles areas with growing seasons spanning calendar years:或者,您可以只创建一个“is_growing_season”掩码,它可以灵活地处理具有跨越日历年的生长季节的区域:

is_growing_season = xr.where(
    ds.harvest >= ds.plant,
    ((days_of_year >= ds.plant) & (days_of_year <= ds.harvest)),
    ((days_of_year >= ds.plant) | (days_of_year <= ds.harvest)),
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM