[英]Pandas DataFrame Groupby two columns and add column for moving average
我有一个数据框,我想使用多个列进行分组,然后根据分组添加计算列(平均值)。 有人可以帮我吗?
我已经尝试了分组,但是效果很好,但是添加计算(滚动平均值)列被证明是一种麻烦
import pandas as pd
import numpy as np
df = pd.DataFrame([[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16], list('AAAAAAAABBBBBBBB'), ['RED','BLUE','GREEN','YELLOW','RED','BLUE','GREEN','YELLOW','RED','BLUE','GREEN','YELLOW','RED','BLUE','GREEN','YELLOW'], ['1','1','1','1','2','2','2','2','1','1','1','1','2','2','2','2'],[100,112,99,120,105,114,100,150,200,134,167,150,134,189,172,179]]).T
df.columns = ['id','Station','Train','month_code','total']
df2 = df.groupby(['Station','Train','month_code','total']).size().reset_index().groupby(['Station','Train','month_code'])['total'].max()
看下面的结果类似
Station Train month_code total average
A BLUE 1 112
2 114 113
GREEN 1 99 106.5
2 100 99.5
RED 1 100 100
2 105 102.5
YELLOW 1 120 112.5
2 150 135
B BLUE 1 134 142
2 189 161.5
GREEN 1 167 178
2 172 169.5
RED 1 200 186
2 134 167
YELLOW 1 150 142
2 179 164.5
您如何更改初始groupby
依据以保持列名'total'
。
df3 = df.groupby(['Station','Train','month_code']).sum()
>>> df3.head()
id total
Station Train month_code
A BLUE 1 2 112
2 6 114
GREEN 1 3 99
2 7 100
RED 1 1 100
然后对total
列进行滚动平均。
df3['average'] = df3['total'].rolling(2).mean()
>>> df3.head()
id total average
Station Train month_code
A BLUE 1 2 112 NaN
2 6 114 113.0
GREEN 1 3 99 106.5
2 7 100 99.5
RED 1 1 100 100.0
然后,您仍然可以删除ID列(如果不需要)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.