[英]Non-overlapping rolling windows in pandas groupby
I want to create non-overlapping rolling or sliding window in pandas groupby我想在 pandas groupby 中创建不重叠的滚动或滑动 window
import pandas as pd
df1 = pd.DataFrame( {'a1':['A','A','B','B','B','B','B','B'],'a2':[1,1,1,2,2,2,2,2], 'b':[1,2,5,5,5,4,6,2]})
For overlapping rolling window, I can do this对于重叠滚动 window,我可以这样做
df1.groupby(['a1','a2']).rolling(2).mean()
But is there any way to make it non-overlapping?但是有没有办法让它不重叠?
The output should be like this output应该是这样的
pd.DataFrame('a1':['A','B','B','B','B'],'a2':[1,1,2,2,2],'b':[1.5,NaN,5,5,NaN])
Explanation解释
When a1
is A
and a2
is 1
, the value of b is 1
and 2
.当a1
为A
且a2
为1
时, b 的值为1
和2
。 Adding both results in 1.5
.在1.5
中添加两个结果。
When a1
is B
and a2
is 1
, the value of b
is 5
.当a1
为B
且a2
为1
时, b
的值为5
。 As the value of b
is less than the length of the sliding window, we got NaN
.由于b
的值小于滑动 window 的长度,我们得到NaN
。
When a1
is B
and a2
is 2
, the value of b is 5,5,4,6,2
.当a1
为B
且a2
为2
时, b 的值为5,5,4,6,2
。 As sliding window is 2
, so adding (5+5)/2=5
, (4+6)/2=5
.由于滑动 window 是2
,所以添加(5+5)/2=5
, (4+6)/2=5
。 And last value is NaN
as length is less than sliding window.最后一个值为NaN
,因为长度小于滑动 window。
Well, one approach (not very elegant), is to do:好吧,一种方法(不是很优雅)是:
def non_overlapping_mean(x, window=2):
return x.groupby(np.arange(len(x)) // window).apply(lambda x: np.nan if len(x) < 2 else x.mean())
res = df1.groupby(['a1', 'a2'])['b'].apply(non_overlapping_mean).droplevel(-1).reset_index()
print(res)
Output Output
a1 a2 b
0 A 1 1.5
1 B 1 NaN
2 B 2 5.0
3 B 2 5.0
4 B 2 NaN
The main idea is to groupby into consecutive chunks, and is done here:主要思想是将groupby分成连续的块,并在此处完成:
x.groupby(np.arange(len(x)) // window)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.