[英]Moving average by column / year - python, pandas
I need to built a moving average over column "total_medals" by country [noc] for all previous years - my daata looks like:我需要在所有前几年按国家 [noc] 在列“total_medals”上建立移动平均线 - 我的数据看起来像:
medal Bronze Gold Medal Silver **total_medals**
noc year
ALG 1984 2.0 NaN NaN NaN 2.0
1992 4.0 2.0 NaN NaN 6.0
1996 2.0 1.0 4.0 7.0
ANZ 1984 2.0 15.0 NaN 2.0 19.0
1992 3.0 5.0 NaN 2.0 10.0
1996 1.0 2.0 2.0 5.0
ARG 1984 2.0 6.0 NaN 3.0 11.0
1992 5.0 3.0 NaN 24.0 32.0
1992 3.0 7.0 NaN 5.0 15.0
I want to have a moving average per country and year (ie for ALG: 1984 Avg (total_medals)=2.0; 1992 Avg(total_medals) = (2.0+6.0)/2 = 4.0; 1996 Acg(total_medals) = (2.0+6.0+7.0)/3 = 5.0) - moving average should appear in new column (next to total_medals).我想要每个国家和年份的移动平均值(即 ALG:1984 Avg (total_medals)=2.0; 1992 Avg(total_medals) = (2.0+6.0)/2 = 4.0; 1996 Acg(total_medals) = (2.0+6.0 +7.0)/3 = 5.0) - 移动平均线应出现在新列中(total_medals 旁边)。
Additionally, for each country & year combination new column called "performance" should be the fraction of "total_medals" divided by "moving average"此外,对于每个国家和年份组合,名为“表现”的新列应该是“总奖牌”除以“移动平均线”的分数
Sample dataframe :样品 dataframe :
print(df)
medal Bronze Gold Medal Silver
noc year
ALG 1984 2.0 NaN NaN NaN 2.0
1992 4.0 2.0 NaN NaN 6.0
1996 2.0 1.0 NaN 4.0 7.0
ANZ 1984 2.0 15.0 NaN 2.0 19.0
1992 3.0 5.0 NaN 2.0 10.0
1996 1.0 2.0 NaN 2.0 5.0
ARG 1984 2.0 6.0 NaN 3.0 11.0
1992 5.0 3.0 NaN 24.0 32.0
1992 3.0 7.0 NaN 5.0 15.0
Use DataFrame.groupby
+ expanding
:使用
DataFrame.groupby
+ expanding
:
df['total_mean']=df.groupby(level=0,sort=False).Silver.apply(lambda x: x.expanding(1).mean())
print(df)
medal Bronze Gold Medal Silver total_medals
noc year
ALG 1984 2.0 NaN NaN NaN 2.0 2.000000
1992 4.0 2.0 NaN NaN 6.0 4.000000
1996 2.0 1.0 NaN 4.0 7.0 5.000000
ANZ 1984 2.0 15.0 NaN 2.0 19.0 19.000000
1992 3.0 5.0 NaN 2.0 10.0 14.500000
1996 1.0 2.0 NaN 2.0 5.0 11.333333
ARG 1984 2.0 6.0 NaN 3.0 11.0 11.000000
1992 5.0 3.0 NaN 24.0 32.0 21.500000
1992 3.0 7.0 NaN 5.0 15.0 19.333333
bonze lagged和尚落后
s=df.groupby('noc').apply(lambda x: x['Bronze']/x['total_medals'].shift())
s.index=s.index.droplevel()
df['bronze_lagged']=s
You could create a function for this...您可以为此创建一个 function ...
def lagged_medals(type_of_medal):
s=df.groupby('noc').apply(lambda x: x[type_of_medal]/x['total_medals'].shift())
s.index=s.index.droplevel()
df[f'{type_of_medal}_lagged']=s
lagged_medals('Silver')
#print(df)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.