基于多列分组的熊猫滚动平均值

Question

I have a Long format dataframe with repeated values in two columns and data in another column. 我有一个长格式的数据框，在两列中有重复的值，在另一列中有数据。 I want to find SMAs for each group. 我想为每个组查找SMA。 My problem is : rolling() simply ignores the fact that the data is grouped by two columns. 我的问题是： rolling()只是忽略了数据按两列分组的事实。

Here is some dummy data and code. 这是一些伪数据和代码。

import numpy as np
import pandas as pd

dtix=pd.Series(pd.date_range(start='1/1/2019', periods=4) )
df=pd.DataFrame({'ix1':np.repeat([0,1],4), 'ix2':pd.concat([dtix,dtix]), 'data':np.arange(0,8) })
df

ix1 ix2 data
0   0   2019-01-01  0
1   0   2019-01-02  1
2   0   2019-01-03  2
3   0   2019-01-04  3
0   1   2019-01-01  4
1   1   2019-01-02  5
2   1   2019-01-03  6
3   1   2019-01-04  7

Now when I perform a grouped rolling mean on this data, I am getting an output like this: 现在，当我对这些数据执行分组的滚动均值时，得到的输出如下：

df.groupby(['ix1','ix2']).agg({'data':'mean'}).rolling(2).mean()

data
ix1 ix2 
0   2019-01-01  NaN
    2019-01-02  0.5
    2019-01-03  1.5
    2019-01-04  2.5
1   2019-01-01  3.5
    2019-01-02  4.5
    2019-01-03  5.5
    2019-01-04  6.5

Desired Output: Whereas, what I would actually like to have is this: 所需的输出：而我实际上想要的是：

sma
ix1 ix2 
0   2019-01-01  NaN
    2019-01-02  0.5
    2019-01-03  1.5
    2019-01-04  2.5
1   2019-01-01  NaN
    2019-01-02  4.5
    2019-01-03  5.5
    2019-01-04  6.5

Will appreciate your help with this. 感谢您的帮助。

Answer 1

Use another groupby by firast level ( ix1 ) with rolling : 使用另一个groupby由firast水平（ ix1 ）与rolling ：

df1 = (df.groupby(['ix1','ix2'])
         .agg({'data':'mean'})
         .groupby(level=0, group_keys=False)
         .rolling(2)
         .mean())
print (df1)
                data
ix1 ix2             
0   2019-01-01   NaN
    2019-01-02   0.5
    2019-01-03   1.5
    2019-01-04   2.5
1   2019-01-01   NaN
    2019-01-02   4.5
    2019-01-03   5.5
    2019-01-04   6.5

In your solution affter aggregation is returned one column DataFrame , so chained rolling working with all rows, not per groups like need: 在您的解决方案中，聚合返回的是一列DataFrame ，因此链式rolling用于所有行，而不是按需要按组进行：

print(df.groupby(['ix1','ix2']).agg({'data':'mean'}))
                data
ix1 ix2             
0   2019-01-01     0
    2019-01-02     1
    2019-01-03     2
    2019-01-04     3
1   2019-01-01     4
    2019-01-02     5
    2019-01-03     6
    2019-01-04     7

基于多列分组的熊猫滚动平均值

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-06-11 08:29:48

基于多列分组的熊猫滚动平均值

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-06-11 08:29:48

解决方案1
0 已采纳 2019-06-11 08:29:48