简体   繁体   English

非重叠滚动 windows 在 pandas groupby

[英]Non-overlapping rolling windows in pandas groupby

I want to create non-overlapping rolling or sliding window in pandas groupby我想在 pandas groupby 中创建不重叠的滚动或滑动 window

import pandas as pd
df1 = pd.DataFrame( {'a1':['A','A','B','B','B','B','B','B'],'a2':[1,1,1,2,2,2,2,2], 'b':[1,2,5,5,5,4,6,2]})

For overlapping rolling window, I can do this对于重叠滚动 window,我可以这样做

df1.groupby(['a1','a2']).rolling(2).mean()

But is there any way to make it non-overlapping?但是有没有办法让它不重叠?

The output should be like this output应该是这样的

pd.DataFrame('a1':['A','B','B','B','B'],'a2':[1,1,2,2,2],'b':[1.5,NaN,5,5,NaN])

Explanation解释

When a1 is A and a2 is 1 , the value of b is 1 and 2 .a1Aa21时, b 的值为12 Adding both results in 1.5 .1.5中添加两个结果。
When a1 is B and a2 is 1 , the value of b is 5 .a1Ba21时, b的值为5 As the value of b is less than the length of the sliding window, we got NaN .由于b的值小于滑动 window 的长度,我们得到NaN
When a1 is B and a2 is 2 , the value of b is 5,5,4,6,2 .a1Ba22时, b 的值为5,5,4,6,2 As sliding window is 2 , so adding (5+5)/2=5 , (4+6)/2=5 .由于滑动 window 是2 ,所以添加(5+5)/2=5 , (4+6)/2=5 And last value is NaN as length is less than sliding window.最后一个值为NaN ,因为长度小于滑动 window。

Well, one approach (not very elegant), is to do:好吧,一种方法(不是很优雅)是:

def non_overlapping_mean(x, window=2):
    return x.groupby(np.arange(len(x)) // window).apply(lambda x: np.nan if len(x) < 2 else x.mean())


res = df1.groupby(['a1', 'a2'])['b'].apply(non_overlapping_mean).droplevel(-1).reset_index()
print(res)

Output Output

  a1  a2    b
0  A   1  1.5
1  B   1  NaN
2  B   2  5.0
3  B   2  5.0
4  B   2  NaN

The main idea is to groupby into consecutive chunks, and is done here:主要思想是将groupby分成连续的块,并在此处完成:

x.groupby(np.arange(len(x)) // window)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM