简体   繁体   English

Pandas groupby 累计和月份

[英]Pandas groupby cumulative sum and month

I need to sort my df by month with the cumulative sum for each user (about 5 or 6).我需要用每个用户的累积总和(大约 5 或 6)按月对我的 df 进行排序。 Each line is a different log entry by the user, so users may have multiple entries on the same day.每行是用户的不同日志条目,因此用户可能在同一天有多个条目。

        USER        DATE
1      user1  2021-04-15
4      user5  2021-04-15
5      user3  2021-04-15
6      user1  2021-04-15
14     user2  2021-04-16
...      ...         ...
2227   user4  2021-12-30
2228   user5  2021-12-30
2229   user3  2021-12-30
2230   user2  2021-12-30
2231   user1  2021-12-30

I would like to get something like this我想得到这样的东西

         MONTH    USER  CUMSUM
1      2021-04   user1       3
2      2021-04   user2       5
3      2021-04   user3       2
4      2021-04   user4       0
5      2021-04   user5       1
...        ...     ...     ...
n      2021-12   user1     232
n+1    2021-12   user2     124
n+2    2021-12   user3     152
n+3    2021-12   user4     312
n+4    2021-12   user5     218

The objective is to later graph the cumulative sum by month for each user.目标是稍后按月绘制每个用户的累积总和。 I have a code that is already working but had to iterate on the df and count each entry for each month on a dict.我有一个已经在工作的代码,但必须在 df 上进行迭代,并在 dict 上计算每个月的每个条目。 Probably not the most efficient way.可能不是最有效的方法。 I tried using cumsum and groupby but so far without success.我尝试使用 cumsum 和 groupby 但到目前为止没有成功。

You can use pandas Grouper or more typically written pd.Grouper for the month, but you have to set an index if you don't already have one.您可以使用pandas Grouper或更常见的写为当月的pd.Grouper ,但如果您还没有索引,则必须设置索引。

df.set_index('DATE').groupby([pd.Grouper(freq = 'M'),'USER']).sum()
df['MONTH'] = pd.to_datetime(df['DATE'], format='%Y-%m-%d') # getting DATE to datetime
df['MONTH'] = df['MONTH'].apply(lambda x: x.strftime("%Y-%m")) # applying your format
df['count'] = 1 # adding a count column for cumsum()
df_try = df.groupby(['USER', 'MONTH']).sum().groupby(level=0).cumsum() # groupby and cumsum

how about this one-liner:这个单线怎么样:

df.groupby([pd.Grouper(key='DATE', freq='M'), 'USER'])['USER'].count().groupby(['USER']).cumsum()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM