简体   繁体   English

使用坐标因变量切片 xarray 数据集

[英]Slicing xarray dataset with coordinate dependent variable

I built an xarray dataset in python3 with coordinates (time, levels) to identify all cloud bases and cloud tops during one day of observations.我在 python3 中构建了一个带有坐标(time, levels)的 xarray 数据集,以在一天的观察中识别所有云底和云顶。 The variable levels is the dimension for the cloud base/tops that can be identified at a given time.变量levels是可以在给定时间识别的云基础/顶部的维度。 It stores cloud base/top heights values for each time.它存储每次的云底/顶部高度值。

Now I want to select all the cloud bases and tops that are located within a given range of heights that change in time.现在我想 select 所有位于给定高度范围内随时间变化的云底和顶部。 The height range is identified by the arrays bottom_mod and top_mod .高度范围由 arrays bottom_modtop_mod These arrays have a time dimension and contain the edges of the range of heights to be selected.这些 arrays 具有time维度并包含要选择的高度范围的边缘。

The xarray dataset is cloudStandard_mod_reshaped : xarray 数据集是cloudStandard_mod_reshaped

Dimensions:     (levels: 8, time: 9600)
Coordinates:
  * levels      (levels) int64 0 1 2 3 4 5 6 7
  * time        (time) datetime64[ns] 2013-04-14 ... 2013-04-14T23:59:51
Data variables:
    cloudTop    (time, levels) float64 nan nan nan nan nan ... nan nan nan nan
    cloudThick  (time, levels) float64 nan nan nan nan nan ... nan nan nan nan
    cloudBase   (time, levels) float64 nan nan nan nan nan ... nan nan nan nan

I tried to select the heights in the range identified by top and bottom array as follows:我试图 select 由顶部和底部数组标识的范围内的高度如下:

PBLclouds = cloudStandard_mod_reshaped.sel(levels=slice(bottom_mod[:], top_mod[:]))

but this instruction does accept only scalar values for the slice command.但该指令只接受 slice 命令的标量值。

Do you know how to slice with values that are coordinate-dependent?您知道如何使用与坐标相关的值进行切片吗?

You can use the .where() method.您可以使用.where()方法。

The line providing the solution is under 2.提供解决方案的行低于2。

1. First, create some data like yours: 1.首先,创建一些像你这样的数据:

The dataset:数据集:

nlevels, ntime = 8, 50

ds = xr.Dataset(
    coords=dict(levels=np.arange(nlevels), time=np.arange(ntime),),
    data_vars=dict(
        cloudTop=(("levels", "time"), np.random.randn(nlevels, ntime)),
        cloudThick=(("levels", "time"), np.random.randn(nlevels, ntime)),
        cloudBase=(("levels", "time"), np.random.randn(nlevels, ntime)),
    ),
)

output of print(ds) : output print(ds)

<xarray.Dataset>
Dimensions:     (levels: 8, time: 50)
Coordinates:
  * levels      (levels) int64 0 1 2 3 4 5 6 7
  * time        (time) int64 0 1 2 3 4 5 6 7 8 9 ... 41 42 43 44 45 46 47 48 49
Data variables:
    cloudTop    (levels, time) float64 0.08375 0.04721 0.9379 ... 0.04877 2.339
    cloudThick  (levels, time) float64 -0.6441 -0.8338 -1.586 ... -1.026 -0.5652
    cloudBase   (levels, time) float64 -0.05004 -0.1729 0.7154 ... 0.06507 1.601

For the top and bottom levels, I'll make the bottom level random and just add an offset to construct the top level.对于顶层和底层,我将使底层随机,并添加一个偏移量来构建顶层。

offset = 3

bot_mod = xr.DataArray(
    dims=("time"),
    coords=dict(time=np.arange(ntime)),
    data=np.random.randint(0, nlevels - offset, ntime),
    name="bot_mod",
)

top_mod = (bot_mod + offset).rename("top_mod")

output of print(bot_mod) : output 的print(bot_mod)

<xarray.DataArray 'bot_mod' (time: 50)>
array([0, 1, 2, 2, 3, 1, 2, 1, 0, 2, 1, 3, 2, 0, 2, 4, 3, 3, 2, 1, 2, 0,
       2, 2, 0, 1, 1, 4, 1, 3, 0, 4, 0, 4, 4, 0, 4, 4, 1, 0, 3, 4, 4, 3,
       3, 0, 1, 2, 4, 0])

2. Then, select the range of levels where clouds are: 2. 那么,select 云所在的层级范围:

use .where() method to select the dataset variables that are between the bottom level and the top level:使用 .where .where()方法对 select 底层和顶层之间的数据集变量:

ds_clouds = ds.where((ds.levels > bot_mod) & (ds.levels < top_mod))

output of print(ds_clouds) : output print(ds_clouds)

<xarray.Dataset>
Dimensions:     (levels: 8, time: 50)
Coordinates:
  * levels      (levels) int64 0 1 2 3 4 5 6 7
  * time        (time) int64 0 1 2 3 4 5 6 7 8 9 ... 41 42 43 44 45 46 47 48 49
Data variables:
    cloudTop    (levels, time) float64 nan nan nan nan nan ... nan nan nan nan
    cloudThick  (levels, time) float64 nan nan nan nan nan ... nan nan nan nan
    cloudBase   (levels, time) float64 nan nan nan nan nan ... nan nan nan nan

It puts nan where the condition is not satisfied, you can use the .dropna() method to get rid of those.它将nan放在不满足条件的地方,您可以使用.dropna()方法来摆脱它们。

3. Check for success: 3.检查成功:

Plot cloudBase variable of the dataset before and after processing:处理前后数据集的Plot cloudBase变量:

fig, axes = plt.subplots(ncols=2)

ds.cloudBase.plot.imshow(ax=axes[0])
ds_clouds.cloudBase.plot.imshow(ax=axes[1])

plt.show()

I'm not yet allowed to embed images, so that's a link:我还不允许嵌入图像,所以这是一个链接:

Original data vs. selected data原始数据与选定数据

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM