简体   繁体   English

按列/年移动平均 - python、pandas

[英]Moving average by column / year - python, pandas

I need to built a moving average over column "total_medals" by country [noc] for all previous years - my daata looks like:我需要在所有前几年按国家 [noc] 在列“total_medals”上建立移动平均线 - 我的数据看起来像:

 medal     Bronze  Gold  Medal  Silver  **total_medals**
    noc year                                           
    ALG 1984     2.0   NaN    NaN     NaN           2.0
        1992     4.0   2.0    NaN     NaN           6.0
        1996     2.0   1.0            4.0           7.0
    ANZ 1984     2.0  15.0    NaN     2.0          19.0
        1992     3.0   5.0    NaN     2.0          10.0
        1996     1.0   2.0            2.0           5.0
    ARG 1984     2.0   6.0    NaN     3.0          11.0
        1992     5.0   3.0    NaN    24.0          32.0
        1992     3.0   7.0    NaN     5.0          15.0

I want to have a moving average per country and year (ie for ALG: 1984 Avg (total_medals)=2.0; 1992 Avg(total_medals) = (2.0+6.0)/2 = 4.0; 1996 Acg(total_medals) = (2.0+6.0+7.0)/3 = 5.0) - moving average should appear in new column (next to total_medals).我想要每个国家和年份的移动平均值(即 ALG:1984 Avg (total_medals)=2.0; 1992 Avg(total_medals) = (2.0+6.0)/2 = 4.0; 1996 Acg(total_medals) = (2.0+6.0 +7.0)/3 = 5.0) - 移动平均线应出现在新列中(total_medals 旁边)。

Additionally, for each country & year combination new column called "performance" should be the fraction of "total_medals" divided by "moving average"此外,对于每个国家和年份组合,名为“表现”的新列应该是“总奖牌”除以“移动平均线”的分数

Sample dataframe :样品 dataframe

print(df)

          medal  Bronze  Gold  Medal  Silver 
noc year                                     
ALG 1984    2.0     NaN   NaN    NaN     2.0 
    1992    4.0     2.0   NaN    NaN     6.0 
    1996    2.0     1.0   NaN    4.0     7.0 
ANZ 1984    2.0    15.0   NaN    2.0    19.0 
    1992    3.0     5.0   NaN    2.0    10.0 
    1996    1.0     2.0   NaN    2.0     5.0 
ARG 1984    2.0     6.0   NaN    3.0    11.0 
    1992    5.0     3.0   NaN   24.0    32.0 
    1992    3.0     7.0   NaN    5.0    15.0 

Use DataFrame.groupby + expanding :使用DataFrame.groupby + expanding

df['total_mean']=df.groupby(level=0,sort=False).Silver.apply(lambda x: x.expanding(1).mean())
print(df)

          medal  Bronze  Gold  Medal  Silver  total_medals 
noc year                                                 
ALG 1984    2.0     NaN   NaN    NaN     2.0    2.000000 
    1992    4.0     2.0   NaN    NaN     6.0    4.000000 
    1996    2.0     1.0   NaN    4.0     7.0    5.000000 
ANZ 1984    2.0    15.0   NaN    2.0    19.0   19.000000 
    1992    3.0     5.0   NaN    2.0    10.0   14.500000 
    1996    1.0     2.0   NaN    2.0     5.0   11.333333 
ARG 1984    2.0     6.0   NaN    3.0    11.0   11.000000 
    1992    5.0     3.0   NaN   24.0    32.0   21.500000 
    1992    3.0     7.0   NaN    5.0    15.0   19.333333 

bonze lagged和尚落后

s=df.groupby('noc').apply(lambda x: x['Bronze']/x['total_medals'].shift())
s.index=s.index.droplevel()
df['bronze_lagged']=s

You could create a function for this...您可以为此创建一个 function ...

def lagged_medals(type_of_medal):
    s=df.groupby('noc').apply(lambda x: x[type_of_medal]/x['total_medals'].shift())
    s.index=s.index.droplevel()
    df[f'{type_of_medal}_lagged']=s

lagged_medals('Silver')
#print(df)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM