[英]Calculating rolling average per group in pandas df
I have a df
like this:我有一个这样的
df
:
date car model mpg
1 ford focus 10
1 ford fiesta 15
1 ford mustang 20
2 ford focus 13
2 ford fiesta 16
2 ford mustang 27
3 ford focus 13
3 ford mustang 27
4 ford focus 12
4 ford fiesta 17
I would like to add a column rolling_mean
with window = 2
of date
per group of date, car, model
so that I would have a df like this:我想为每组
date, car, model
添加一个带有window = 2
of date
rolling_mean
列date, car, model
这样我就会有一个这样的 df:
date car model mpg rolling_avg
1 ford focus 10 nan
1 ford fiesta 15 nan
1 ford mustang 20 nan
2 ford focus 13 11.5
2 ford fiesta 16 15.5
2 ford mustang 27 23.5
3 ford focus 13 13
3 ford mustang 27 27
4 ford focus 12 12.5
4 ford fiesta 17 Because fiesta is not in date=3, I want to (17+0)/2 = 8.5
What I tried:我试过的:
df_test.groupby(['date','car','model'])[['mpg']].rolling(window=2).mean().reset_index()
date car model level_3 mpg
0 1 ford fiesta 1 NaN
1 1 ford focus 0 NaN
2 1 ford mustang 2 NaN
3 2 ford fiesta 4 NaN
4 2 ford focus 3 NaN
5 2 ford mustang 5 NaN
6 3 ford focus 6 NaN
7 3 ford mustang 7 NaN
8 4 ford fiesta 9 NaN
9 4 ford focus 8 NaN
Not sure what does level_3
stand for.不确定
level_3
代表什么。 Where is my mistake in trying to achieve the structure I desire?在尝试实现我想要的结构时,我的错误在哪里?
Here is the data used:这是使用的数据:
df = pd.DataFrame({'date':[1,1,1,2,2,2,3,3,4,4],
'car':['ford','ford','ford','ford','ford','ford','ford','ford','ford','ford'],
'model':['focus','fiesta','mustang','focus','fiesta','mustang','focus','mustang','focus','fiesta'],
'mpg':[10,15,20,13,16,27,13,27,12,17]})
For me working reshape value first by DataFrame.set_index
and Series.unstack
, if no match is added 0
, then use rolling
and last reshape back by DataFrame.stack
and add new column by DataFrame.join
:对我来说,首先由工作重塑价值
DataFrame.set_index
和Series.unstack
,如果没有匹配加0
,然后使用rolling
和最后重塑背部DataFrame.stack
,并通过添加新列DataFrame.join
:
s = (df.set_index(['date','car','model'])['mpg']
.unstack(fill_value=0)
.rolling(window=2)
.mean()
.stack()
.rename('rolling_avg')
)
df = df.join(s, on=['date','car','model'])
print (df)
date car model mpg rolling_avg
0 1 ford focus 10 NaN
1 1 ford fiesta 15 NaN
2 1 ford mustang 20 NaN
3 2 ford focus 13 11.5
4 2 ford fiesta 16 15.5
5 2 ford mustang 27 23.5
6 3 ford focus 13 13.0
7 3 ford mustang 27 27.0
8 4 ford focus 12 12.5
9 4 ford fiesta 17 8.5
EDIT: If set_index
with unstack
fialed, there are duplicates like:编辑:如果
set_index
与unstack
fialed,也有重复,如:
df = pd.DataFrame({'date':[1,1,1,2,2,2,3,3,4,4],
'car':['ford','ford','ford','ford','ford','ford','ford','ford','ford','ford'],
'model':['focus','focus','mustang','focus','focus','mustang','focus','mustang','focus','fiesta'],
'mpg':[10,15,20,13,16,27,13,27,12,17]})
print (df)
date car model mpg
0 1 ford focus 10 <- dupe 1 ford focus
1 1 ford focus 15 <- dupe 1 ford focus
2 1 ford mustang 20
3 2 ford focus 13 <- dupe 2 ford focus
4 2 ford focus 16 <- dupe 2 ford focus
5 2 ford mustang 27
6 3 ford focus 13
7 3 ford mustang 27
8 4 ford focus 12
9 4 ford fiesta 17
Then if possible first need unique pairs, here by aggregation sum
(or mean
like need):然后如果可能的话首先需要唯一的对,这里通过聚合
sum
(或mean
需要):
df1 = df.pivot_table(index=['date','car'],
columns='model',
values='mpg',
aggfunc='sum',
fill_value=0)
print (df1)
model fiesta focus mustang
date car
1 ford 0 25 20
2 ford 0 29 27
3 ford 0 13 27
4 ford 17 12 0
And then is possible use rolling
, output is different like input data, because unique 'date','car','model'
:然后可以使用
rolling
,输出与输入数据不同,因为唯一的'date','car','model'
:
df1 = (df1.rolling(window=2)
.mean()
.stack(dropna=False)
.rename('rolling_avg')
.reset_index()
)
print (df1)
date car model rolling_avg
0 1 ford fiesta NaN
1 1 ford focus NaN
2 1 ford mustang NaN
3 2 ford fiesta 0.0
4 2 ford focus 27.0
5 2 ford mustang 23.5
6 3 ford fiesta 0.0
7 3 ford focus 21.0
8 3 ford mustang 27.0
9 4 ford fiesta 8.5
10 4 ford focus 12.5
11 4 ford mustang 13.5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.