简体   繁体   English

按 3D xarray 中的月份编号访问数据

[英]Access data by month number in 3D xarray

I have data arrays (361x361) for Jan, Feb, March, Apr, Oct, Nov and Dec for a given year.我有给定年份的 1 月、2 月、3 月、4 月、10 月、11 月和 12 月的数据数组 (361x361)。

So far I've been storing them in individual netcdfs for every month in the year (eg 03.nc, 10.nc)到目前为止,我一直将它们存储在一年中每个月的单独 netcdf 中(例如 03.nc、10.nc)

I'd like to combine all months into one netcdf, so that I can do something like:我想将所有月份合并为一个 netcdf,以便我可以执行以下操作:

march_data = data.sel(month='03') 

or alternatively data.sel(month=3))或者data.sel(month=3))

So far I've only been able to stack the monthly data in a 361x361x7 array and it's unhelpfully indexed so that to get March data you need to do data[:,:,2] and to get October it's data[:,:,4].到目前为止,我只能将每月数据堆叠在一个 361x361x7 的数组中,并且它的索引无济于事,因此要获得 3 月的数据,您需要执行 data[:,:,2] 并获得 10 月的数据 [:,:, 4]。 Clearly 2 & 4 do not intuitively correspond to the months of March and October.显然,2 和 4 并不直观地对应于三月和十月。 This is in part because python is indexed from zero and in part because I'm missing the summer months.这部分是因为 python 从零开始索引,部分是因为我错过了夏季月份。 I could put nan fields in for the missing months, but that wouldn't solve the index-0 issue.我可以将 nan 字段放入缺失的月份,但这并不能解决 index-0 问题。

My attempt so far:到目前为止我的尝试:

 data = xarray.Dataset( data_vars={'ice_type':(['x','y','time'],year_array),},
                      coords={'lon':(['x','y'],lon_target),
                              'lat':(['x','y'],lat_target),
                              'month_number':(['time'],month_int)})

Here year_array is a 361x361x7 numpy array, and month_int is a list that maps the third index of year_array to the month number: [1,2,3,4,10,11,12] .这里year_array是一个 361x361x7 numpy 数组,而month_int是一个列表,它将year_array的第三个索引year_array到月份编号: [1,2,3,4,10,11,12]

When I try to get Oct data with oct = data.sel(month_number=10) it throws an error.当我尝试使用oct = data.sel(month_number=10)获取 Oct 数据时,它会引发错误。

On a side note, I'm aware that there's possibly a solution to be found here , but to be honest I don't understand how it works.在一个侧面说明,我知道,有可能被发现的解决方案在这里,但说实话,我不明白它是如何工作的。 My confusion is mostly based around how they use 'time' both as a dictionary key and list of times at the same time.我的困惑主要是基于他们如何同时使用“时间”作为字典键和时间列表。

I think I've written a helper function to do something just like that:我想我已经写了一个辅助函数来做这样的事情:

def combine_new_ds_dim(ds_dict, new_dim_name):
    """
    Combines a dictionary of datasets along a new dimension using dictionary keys
    as the new coordinates.

    Parameters
    ----------
    ds_dict : dict
        Dictionary of xarray Datasets or dataArrays
    new_dim_name : str
        The name of the newly created dimension

    Returns
    -------
    xarray.Dataset
        Merged Dataset or DataArray

    Raises
    ------
    ValueError
        If the values of the input dictionary were of an unrecognized type
    """

    expanded_dss = []

    for k, v in ds_dict.items():
        expanded_dss.append(v.expand_dims(new_dim_name))
        expanded_dss[-1][new_dim_name] = [k]
    new_ds = xr.concat(expanded_dss, new_dim_name)

    return new_ds

If you have all of the data in individual netcdfs then you should be able to import them into individual dataArray 's.如果您在单独的 netcdfs 中拥有所有数据,那么您应该能够将它们导入到单独的dataArray Assuming you've done that, you could then do假设你已经这样做了,那么你可以做

month_das = {
    1: january_da,
    2: february_da,
    ...
    12: december_da
}

year_data = combine_new_ds_dim(month_das, 'month')

which would be the concatenation of all of the data along the new dimension month with the desired coordinates.这将是沿新维度month的所有数据与所需坐标的串联。 I think the main loop of the function is easy enough to separate if you want to use that alone.如果你想单独使用它,我认为函数的主循环很容易分开。

EDIT:编辑:

For anyone looking at this in the future, there's a much easier way of doing this with builtin xarray functions.对于将来看到这个的任何人来说,使用内置的 xarray 函数有一种更简单的方法来做到这一点。 You can just concatenate along a new dimension您可以沿着新维度串联

year_data = xr.concat([january_da, february_da, ..., december_da], dim="month")

which will create a new dataArray with the constituent arrays concatenated along a new dimension, but without coordinates on that dimension.这将创建一个新的dataArray其中包含沿新维度连接的组成数组,但在该维度上没有坐标。 To add coordinates,要添加坐标,

year_data["month"] = [1, 2, ..., 12]

at which point year_data will be concatenated along the new dimension "month" and will have the desired coordinates along that dimension.此时year_data将沿新维度“月”连接,并沿该维度具有所需的坐标。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM