[英]Python calculation for moving average
I have a dataset as below:我有一个数据集如下:
import pandas as pd
data = {'Category': ['A','A','A','A','A','A','B','B','B','B','B','C','C','C','C','C'],
'Date' : [1,2,3,4,5,6,1,2,3,4,5,1,2,3,4,5],
'Count': [1,2,3,4,5,1,2,3,4,5,6,1,2,3,4,6]}
df = pd.DataFrame(data)
I was trying to calculate the average count of every 3 rows excluding the current row grouped by category and the date is from new to old, if there is not enough 3 rows to calculate, it will return 0.我试图计算每 3 行的平均计数,不包括按类别分组的当前行,并且日期是从新到旧,如果没有足够的 3 行来计算,它将返回 0。
The expected result should be as below.预期结果应如下所示。 For example: for Category A for Date 1, the average is calculated as average count of dates 2, 3 and 4 of Category A.
例如:对于日期 1 的类别 A,平均值计算为类别 A 的日期 2、3 和 4 的平均计数。
Category![]() |
Date![]() |
Count![]() |
Average![]() |
---|---|---|---|
A![]() |
1 ![]() |
1 ![]() |
3 ![]() |
A![]() |
2 ![]() |
2 ![]() |
4 ![]() |
A![]() |
3 ![]() |
3 ![]() |
3.333333333 ![]() |
A![]() |
4 ![]() |
4 ![]() |
2.666666666 ![]() |
A![]() |
5 ![]() |
5 ![]() |
0 ![]() |
A![]() |
6 ![]() |
1 ![]() |
0 ![]() |
B![]() |
1 ![]() |
2 ![]() |
4 ![]() |
B![]() |
2 ![]() |
3 ![]() |
5 ![]() |
B![]() |
3 ![]() |
4 ![]() |
0 ![]() |
B![]() |
4 ![]() |
5 ![]() |
0 ![]() |
B![]() |
5 ![]() |
6 ![]() |
0 ![]() |
C ![]() |
1 ![]() |
1 ![]() |
3 ![]() |
C ![]() |
2 ![]() |
2 ![]() |
4.333333333 ![]() |
C ![]() |
3 ![]() |
3 ![]() |
0 ![]() |
C ![]() |
4 ![]() |
4 ![]() |
0 ![]() |
C ![]() |
5 ![]() |
6 ![]() |
0 ![]() |
I was trying to use below, which didn't get expected result我试图在下面使用,但没有得到预期的结果
df['average'] = df.groupby(['Category'])['count'].transform(lambda x: x.rolling(3, 1).mean())
You can use rolling
in combination with shift
and sort_values
as follows:您可以将
rolling
与shift
和sort_values
结合使用,如下所示:
def reverse_roll(df):
df['Count'] = df.sort_values('Date', ascending=False)['Count'].rolling(3, 3).mean().shift()
df['Count'] = df['Count'].fillna(0.0)
df = df.sort_values('Date', ascending=True)
return df
df.groupby('Category').apply(reverse_roll)
Above, rolling(3, 3)
is used to force the rolling window to always consider 3 rows and not less.上面,
rolling(3, 3)
用于强制滚动 window 始终考虑 3 行而不是更少。 The first 2 rows the result will therefore be NaN which is set to 0 with fillna
.因此,结果的前 2 行将是 NaN,使用
fillna
设置为 0。 To not include the current row in the window, shift
is used.要在 window 中不包括当前行,请使用
shift
。
Result:结果:
Category Date Count
0 A 1 3.000000
1 A 2 4.000000
2 A 3 3.333333
3 A 4 0.000000
4 A 5 0.000000
5 A 6 0.000000
6 B 1 4.000000
7 B 2 5.000000
8 B 3 0.000000
9 B 4 0.000000
10 B 5 0.000000
11 C 1 3.000000
12 C 2 4.333333
13 C 3 0.000000
14 C 4 0.000000
15 C 5 0.000000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.