使用库 xarray (python) 进行 groupby 后时间维度错误

Question

my problem is that I would like to use the easy functionality of the xarray-library in python, but I run into problems with the time dimension in case of aggregating data.我的问题是我想在 python 中使用 xarray-library 的简单功能，但是在聚合数据的情况下我遇到了时间维度的问题。

I have opened a dataset, which contains daily data over the year 2013: datset=xr.open_dataset(filein) .我打开了一个数据集，其中包含 2013 年的每日数据： datset=xr.open_dataset(filein) 。

The contents of the file are:该文件的内容是：

<xarray.Dataset>
Dimensions:       (bnds: 2, rlat: 228, rlon: 234, time: 365)
Coordinates:
  * rlon          (rlon) float64 -28.24 -28.02 -27.8 -27.58 -27.36 -27.14 ...
  * rlat          (rlat) float64 -23.52 -23.3 -23.08 -22.86 -22.64 -22.42 ...
  * time          (time) datetime64[ns] 2013-01-01T11:30:00 ...
Dimensions without coordinates: bnds
Data variables:
    rotated_pole  |S1 ''
    time_bnds     (time, bnds) float64 1.073e+09 1.073e+09 1.073e+09 ...
    ASWGLOB_S     (time, rlat, rlon) float64 nan nan nan nan nan nan nan nan ...
Attributes:
    CDI:                       Climate Data Interface version 1.7.0 (http://m...
    Conventions:               CF-1.4
    references:                http://www.clm-community.eu/
    NCO:                       4.6.7
    CDO:                       Climate Data Operators version 1.7.0

When I use now the groupby method to compute the monthly means, the time dimension is destroyed:当我现在使用 groupby 方法计算月均值时，时间维度被破坏：

datset.groupby('time.month')
<xarray.core.groupby.DatasetGroupBy object at 0x246a250>
>>> datset.groupby('time.month').mean('time')
<xarray.Dataset>
Dimensions:    (bnds: 2, month: 12, rlat: 228, rlon: 234)
Coordinates:
  * rlon       (rlon) float64 -28.24 -28.02 -27.8 -27.58 -27.36 -27.14 ...
  * rlat       (rlat) float64 -23.52 -23.3 -23.08 -22.86 -22.64 -22.42 -22.2 ...
  * month      (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (month, bnds) float64 1.074e+09 1.074e+09 1.077e+09 1.077e+09 ...
    ASWGLOB_S  (month, rlat, rlon) float64 nan nan nan nan nan nan nan nan ...

Now I have instead of a time dimension a month dimension with values from 1 to 12. Is this a side effect of the 'mean' - function?现在我有一个月份维度而不是时间维度，其值从 1 到 12。这是“均值”函数的副作用吗？ As long as i do not use this mean function, the time variable is retained.只要我不使用这个均值函数，时间变量就会被保留。

What I am doing wrong?我做错了什么？ The examples given in the documentation and this forum seems to have a different behaviour.文档和本论坛中给出的示例似乎有不同的行为。 There, timestamps are retained except that the first date of each month is used.在那里，除了使用每个月的第一个日期外，都会保留时间戳。

Can I reinvent my old time dimension?我可以重塑我的旧时间维度吗？ What if I want to have time stamps indicating the middle of the month and 'time_bounds' indicating the interval for each mean-value, ie beginning of the month, end of the month.如果我想让时间戳指示月中，'time_bounds' 指示每个平均值的间隔，即月初、月底，该怎么办。

Thanks for your help, Ronny谢谢你的帮助，罗尼

Answer 1

What you describe is expected behavior: When you aggregate with .groupby and apply a reduction function like mean , the dimension you aggregated over is replaced by the index of the group - in this case the 12 months.您所描述的是预期行为：当您使用.groupby聚合并应用诸如mean之类的归约函数时，您聚合的维度将替换为组的索引- 在这种情况下为 12 个月。

Imagine you have a multi-year time series.假设您有一个多年时间序列。 Then ds.groupby('time.month').mean(dim='time') gives you the averages of each month in any year (eg all "Januaries" combined into one average).然后ds.groupby('time.month').mean(dim='time')为您提供任何一年中每个月的平均值（例如，所有“一月”合并为一个平均值）。

Are you sure you did not want to take a monthly average ?您确定不想取月平均值吗？ Then ds.resample(time='1m').mean(dim='time') is what you need and it will actually give you a proper time dimension.然后ds.resample(time='1m').mean(dim='time')就是你所需要的，它实际上会给你一个适当的时间维度。

However, if you did want the multi-year aggregated average but want a proper time dimension, then you can replace your new month index with a time index like so:但是，如果您确实想要多年聚合平均值但想要一个适当的time维度，那么您可以用这样的time索引替换您的新month索引：

ds['month'] = [datetime.datetime(2017, month, 1) for month in ds['month'].values]
ds = ds.rename({'month': 'time'})

where 2017 is some year you choose as the year of your monthly index.其中2017是您选择作为月度指数年份的年份。

使用库 xarray (python) 进行 groupby 后时间维度错误

问题描述

1 个解决方案

解决方案1
3 2017-12-14 10:57:23

使用库 xarray (python) 进行 groupby 后时间维度错误

问题描述

1 个解决方案

解决方案1 3 2017-12-14 10:57:23

解决方案1
3 2017-12-14 10:57:23