在多索引 dataframe 上计算基于时间的滚动平均值

Question

I have a multi-indexed dataframe, which I group with a mask.我有一个多索引 dataframe，我将它与掩码组合在一起。 Afterwards, I want to calculate the time-based rolling average.之后，我想计算基于时间的滚动平均值。

time = pd.date_range('2000-05-01', freq='24H', periods=10)
mult_index = pd.MultiIndex.from_product([time, [0,1]], names=["time", "number"])
data = pd.DataFrame(range(20), index=mult_index)
mask = list(range(5)) * 4
data.groupby(mask).rolling("2d", on=mult_index.levels[0]).mean()

However, this raises the exception:但是，这引发了异常：

Traceback (most recent call last):
File "C:\Users\bi4372\.conda\envs\EnergyTimeSeriesFramework\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-77-e484a6b352eb>", line 1, in <module>
rolling.count()
File "C:\Users\bi4372\.conda\envs\EnergyTimeSeriesFramework\lib\site-packages\pandas\core\window\common.py", line 40, in outer
return self._groupby.apply(f)
File "C:\Users\bi4372\.conda\envs\EnergyTimeSeriesFramework\lib\site-packages\pandas\core\groupby\groupby.py", line 735, in apply
result = self._python_apply_general(f)
File "C:\Users\bi4372\.conda\envs\EnergyTimeSeriesFramework\lib\site-packages\pandas\core\groupby\groupby.py", line 751, in _python_apply_general
keys, values, mutated = self.grouper.apply(f, self._selected_obj, self.axis)
File "C:\Users\bi4372\.conda\envs\EnergyTimeSeriesFramework\lib\site-packages\pandas\core\groupby\ops.py", line 206, in apply
res = f(group)
File "C:\Users\bi4372\.conda\envs\EnergyTimeSeriesFramework\lib\site-packages\pandas\core\window\common.py", line 38, in f
return getattr(x, name)(*args, **kwargs)
File "C:\Users\bi4372\.conda\envs\EnergyTimeSeriesFramework\lib\site-packages\pandas\core\window\rolling.py", line 1969, in count
return self._apply(window_func, center=self.center, name="count")
File "C:\Users\bi4372\.conda\envs\EnergyTimeSeriesFramework\lib\site-packages\pandas\core\window\rolling.py", line 518, in _apply
return self._wrap_results(results, block_list, obj, exclude)
File "C:\Users\bi4372\.conda\envs\EnergyTimeSeriesFramework\lib\site-packages\pandas\core\window\rolling.py", line 331, in _wrap_results
final.append(Series(self._on, index=obj.index, name=name))
File "C:\Users\bi4372\.conda\envs\EnergyTimeSeriesFramework\lib\site-packages\pandas\core\series.py", line 292, in __init__
f"Length of passed values is {len(data)}, "
ValueError: Length of passed values is 10, index implies 4

Does anyone has an idea how to solve this issue?有谁知道如何解决这个问题？ If I try it without an multi-index dataframe everything works:如果我在没有多索引 dataframe 的情况下尝试它，一切正常：

time = pd.date_range('2000-05-01', freq='24H', periods=10)
data = pd.DataFrame(range(10), index=time)
mask = list(range(5)) * 2
data.groupby(mask).rolling("2d").mean()

Thanks in advance for your help.在此先感谢您的帮助。

EDIT 1 - Clearification of the question:编辑 1 - 澄清问题：

In the answer below DavideBrex proposes an approach to reset the index for solving this issue.在下面的答案中，DavideBrex 提出了一种重置索引以解决此问题的方法。 However, a consequence of this solution would be that rows with number 0 will interfere with rows with number 1. I want to avoid this behaviour.但是，此解决方案的结果是编号为 0 的行会干扰编号为 1 的行。我想避免这种行为。 See the following additional example:请参阅以下附加示例：

time = pd.date_range('2000-05-01', freq='24H', periods=3)
mult_index = pd.MultiIndex.from_product([time, [0,1]], names=["time", "number"])
data = pd.DataFrame(range(6), index=mult_index)
data.columns=["col"]
mask = [0,0,1,1,0,1]
res = data.reset_index(level='number').groupby(mask).rolling('3d').mean()

The desired result would be期望的结果是

              number  col
  time                   
0 2000-05-01     0.0  0.0
  2000-05-01     1.0  1.0
  2000-05-03     0.0  4.0
1 2000-05-02     0.0  2.0
  2000-05-02     1.0  3.0
  2000-05-03     1.0  4.0

However, the true result is:然而，真实的结果是：

                number       col
  time                          
0 2000-05-01  0.000000  0.000000
  2000-05-01  0.500000  0.500000
  2000-05-03  0.000000  4.000000
1 2000-05-02  0.000000  2.000000
  2000-05-02  0.500000  2.500000
  2000-05-03  0.666667  3.333333

Answer 1

The problem is that the groupby gives groups of 4 rows:问题是 groupby 给出了 4 行的组：

for i, item in data.groupby(mask):
    print(item)

Gives:给出：

                    0
time       number    
2000-05-01 0        0
2000-05-03 1        5
2000-05-06 0       10
2000-05-08 1       15
                    0
time       number    
2000-05-01 1        1
2000-05-04 0        6
2000-05-06 1       11
2000-05-09 0       16
.....      ..      ...

But you are then giving 10 values in the rolling function:但是您随后在滚动 function 中给出 10 个值：

print(mult_index.levels[0])
DatetimeIndex(['2000-05-01', '2000-05-02', '2000-05-03', '2000-05-04',
               '2000-05-05', '2000-05-06', '2000-05-07', '2000-05-08',
               '2000-05-09', '2000-05-10'],
              dtype='datetime64[ns]', name='time', freq='24H')

Try this:尝试这个：

time = pd.date_range('2000-05-01', freq='24H', periods=10)
mult_index = pd.MultiIndex.from_product([time, [0,1]], names=["time", "number"])
data = pd.DataFrame(range(20), index=mult_index)
data.columns=["col"]
mask = list(range(5)) * 4
res = data.reset_index(level='number').groupby(mask).rolling('2d').mean()
res

Output: Output：

    time    number  col
0   2000-05-01  0.0 0.0
0   2000-05-03  1.0 5.0
0   2000-05-06  0.0 10.0
0   2000-05-08  1.0 15.0
1   2000-05-01  1.0 1.0
1   2000-05-04  0.0 6.0
1   2000-05-06  1.0 11.0
1   2000-05-09  0.0 16.0
2   2000-05-02  0.0 2.0
2   2000-05-04  1.0 7.0
2   2000-05-07  0.0 12.0
2   2000-05-09  1.0 17.0
3   2000-05-02  1.0 3.0
3   2000-05-05  0.0 8.0
3   2000-05-07  1.0 13.0
3   2000-05-10  0.0 18.0
4   2000-05-03  0.0 4.0
4   2000-05-05  1.0 9.0
4   2000-05-08  0.0 14.0
4   2000-05-10  1.0 19.0

From answer从答案

在多索引 dataframe 上计算基于时间的滚动平均值

问题描述

EDIT 1 - Clearification of the question:编辑 1 - 澄清问题：

1 个解决方案

解决方案1
0 2020-05-22 14:26:43

在多索引 dataframe 上计算基于时间的滚动平均值

问题描述

EDIT 1 - Clearification of the question:编辑 1 - 澄清问题：

1 个解决方案

解决方案1 0 2020-05-22 14:26:43

解决方案1
0 2020-05-22 14:26:43