简体   繁体   English

Python 组变化百分比

[英]Python Percentage Change by Group

I would like to get the year on year quarterly change in Value1 and Value 2我想获得 Value1 和 Value 2 的逐年季度变化

df =\
pd.DataFrame({'Year':[2010,2010,2010,2010,2009,2009,2009,2009],
              'Quarter':[1,1,2,2,1,1,2,2],
              'Section':['A', 'B', 'A', 'B','A', 'B','A', 'B'],
              'Value1': [1,2,3,4,5,6,7,8],
              'Value2':[10,20,30,40,50,60,70,80]
             })
df.set_index(['Year', 'Quarter', 'Section'], inplace=True)
df

Currently I am doing this:目前我正在这样做:

##Not ideal
df_2009 =\
(df
 .reset_index()
 .where(lambda x: x.Year == 2009)
 .dropna()
 .astype({'Quarter':'int16'})
 .set_index(['Quarter', 'Section'])
 .drop('Year', axis=1)
)

df_2010 =\
(df
 .reset_index()
 .where(lambda x: x.Year == 2010)
 .dropna()
 .astype({'Quarter':'int16'})
 .set_index(['Quarter', 'Section'])
 .drop('Year', axis=1)
)
 
df_2010/df_2009

However, it is not scalable.但是,它不可扩展。 I wonder it there's better way to do this.我想知道有没有更好的方法来做到这一点。 eg pandas functions or UDF例如 pandas 函数或 UDF

ps the result is created by ps 结果是由

(somedata
.groupby(['Year', 'Quarter', 'Section'])
.agg({'Value1':'sum',
      'Value2':'sum'})
)

Are you looking for something like this:您是否正在寻找这样的东西:

df.groupby(['Quarter','Section']).pct_change(-1)

Output: Output:

                        Value1    Value2
Year Quarter Section                    
2010 1       A       -0.800000 -0.800000
             B       -0.666667 -0.666667
     2       A       -0.571429 -0.571429
             B       -0.500000 -0.500000
2009 1       A             NaN       NaN
             B             NaN       NaN
     2       A             NaN       NaN
             B             NaN       NaN

Another way using pct_change that is not as elegant as @QuangHoang's answer.另一种使用pct_change的方式不如@QuangHoang 的回答优雅。 Adding +1 and .dropna() at the end matches your expected output from running your code.最后添加 +1 和.dropna()与运行代码时预期的 output 相匹配。 However, I kept the Year column as that will be required if you have more years (besides that it is the same as your ouput from running your code):但是,我保留了 Year 列,因为如果您有更多年份(除了它与运行代码的输出相同),那将是必需的:

a = df.sort_values(['Section', 'Quarter', 'Year']). \
groupby(['Section', 'Quarter']). \
agg({'Value1' : 'pct_change', 'Value2' : 'pct_change'}). \
dropna().sort_values('Quarter') + 1
a

output: output:

                        Value1      Value2
Year    Quarter Section     
2010    1       A       0.200000    0.200000
                B       0.333333    0.333333
        2       A       0.428571    0.428571
                B       0.500000    0.500000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM