简体   繁体   English

Pandas 中的数据透视表:日期范围内的 aggfunc sum

[英]Pivot-table in pandas: aggfunc sum during date range

I have a pandas dataframe like the following:我有一个如下所示的熊猫数据框:

#               Date      Name  RG
#-----------------------------------
#      1: 2013-04-25     NameA   1
#      2: 2013-04-25     NameB   3
#      3: 2013-04-25     NameC   1
#      4: 2013-04-25     NameD   2
#      5: 2013-04-25     NameE   1
#     ---                                                                              
#  13379: 2020-02-13     NameA   3
#  13380: 2020-02-13     NameB   1
#  13381: 2020-02-13     NameC   4
#  13382: 2020-02-13     NameD   1
#  13383: 2020-02-13     NameE   1

I want to pivot the table and use Name column as index.我想旋转表并使用Name列作为索引。 Each Date appears now as an individual column, so that, for each Name index, RG is summed during the past six months, eg, RG value for NameA in 2020-02-06 is obtained by adding all RG values for NameA between 2019-08-07 and 2020-02-06.现在每个日期都显示为一个单独的列,因此,对于每个Name索引, RG在过去六个月内求和,例如,NameA 在 2020-02-06 的 RG 值是通过将 2019 年至 NameA 的所有 RG 值相加而获得的08-07 和 2020-02-06。 For example:例如:

#          Name     2013-04-25      2013-04-31      2013-05-07   ---   2020-02-06      2020-02-13
#--------------------------------------------------------------------------------------------------
#      1: NameA     1               2               3                  7               23
#      2: NameB     3               3               6                  15              21
#      3: NameC     1               4               5                  16              24
#      4: NameD     2               2               7                  19              40
#      5: NameE     1               4               9                  15              21
#     ---                                                                              
#    276: NameDE    3               4               6                  15              22
#    277: NameDF    1               4               6                  17              22
#    278: NameDG    4               8               9                  11              23
#    279: NameDH    2               3               5                  19              24
#    280: NameDI    1               4               6                  18              20

I could pivot the table by using the following:我可以使用以下方法旋转表格:

df.pivot_table(
    values='RG', index='Name', columns='Date',
    fill_value=0, aggfunc='sum')

However, values in each column should be the cumulative sum of all values during the previous six months for the same Name .但是,每列中的值应该是相同Name的前六个月内所有值的累积总和。 How can I modify aggfunc to accomplish such goal?如何修改 aggfunc 来实现这样的目标?

I found the answer myself.我自己找到了答案。 Before pivoting, it is necessary to generate the cumsum for the selected period by following procedure:在旋转之前,需要通过以下过程生成所选期间的累积总和:

for index, row in df.iterrows():
    currentDate = row['Date']
    previousDate = row['Date'] - pd.DateOffset(months=12)
    name = row['Name']
    mask = (df['Date'] > previousDate) & (df['Date'] <= currentDate) & (df['Name'] == name)
    s = df.loc[mask]['RG'].sum()
    df4.loc[index,'RG'] = s

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM