简体   繁体   English

基于多列分组的熊猫滚动平均值

[英]Pandas Rolling mean based on groupby multiple columns

I have a Long format dataframe with repeated values in two columns and data in another column. 我有一个长格式的数据框,在两列中有重复的值,在另一列中有数据。 I want to find SMAs for each group. 我想为每个组查找SMA。 My problem is : rolling() simply ignores the fact that the data is grouped by two columns. 我的问题是: rolling()只是忽略了数据按两列分组的事实。

Here is some dummy data and code. 这是一些伪数据和代码。

import numpy as np
import pandas as pd

dtix=pd.Series(pd.date_range(start='1/1/2019', periods=4) )
df=pd.DataFrame({'ix1':np.repeat([0,1],4), 'ix2':pd.concat([dtix,dtix]), 'data':np.arange(0,8) })
df
ix1 ix2 data
0   0   2019-01-01  0
1   0   2019-01-02  1
2   0   2019-01-03  2
3   0   2019-01-04  3
0   1   2019-01-01  4
1   1   2019-01-02  5
2   1   2019-01-03  6
3   1   2019-01-04  7

Now when I perform a grouped rolling mean on this data, I am getting an output like this: 现在,当我对这些数据执行分组的滚动均值时,得到的输出如下:

df.groupby(['ix1','ix2']).agg({'data':'mean'}).rolling(2).mean()
data
ix1 ix2 
0   2019-01-01  NaN
    2019-01-02  0.5
    2019-01-03  1.5
    2019-01-04  2.5
1   2019-01-01  3.5
    2019-01-02  4.5
    2019-01-03  5.5
    2019-01-04  6.5

Desired Output: Whereas, what I would actually like to have is this: 所需的输出:而我实际上想要的是:

sma
ix1 ix2 
0   2019-01-01  NaN
    2019-01-02  0.5
    2019-01-03  1.5
    2019-01-04  2.5
1   2019-01-01  NaN
    2019-01-02  4.5
    2019-01-03  5.5
    2019-01-04  6.5

Will appreciate your help with this. 感谢您的帮助。

Use another groupby by firast level ( ix1 ) with rolling : 使用另一个groupby由firast水平( ix1 )与rolling

df1 = (df.groupby(['ix1','ix2'])
         .agg({'data':'mean'})
         .groupby(level=0, group_keys=False)
         .rolling(2)
         .mean())
print (df1)
                data
ix1 ix2             
0   2019-01-01   NaN
    2019-01-02   0.5
    2019-01-03   1.5
    2019-01-04   2.5
1   2019-01-01   NaN
    2019-01-02   4.5
    2019-01-03   5.5
    2019-01-04   6.5

In your solution affter aggregation is returned one column DataFrame , so chained rolling working with all rows, not per groups like need: 在您的解决方案中,聚合返回的是一列DataFrame ,因此链式rolling用于所有行,而不是按需要按组进行:

print(df.groupby(['ix1','ix2']).agg({'data':'mean'}))
                data
ix1 ix2             
0   2019-01-01     0
    2019-01-02     1
    2019-01-03     2
    2019-01-04     3
1   2019-01-01     4
    2019-01-02     5
    2019-01-03     6
    2019-01-04     7

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM