[英]Pandas groupby cumulative sum and month
I need to sort my df by month with the cumulative sum for each user (about 5 or 6).我需要用每个用户的累积总和(大约 5 或 6)按月对我的 df 进行排序。 Each line is a different log entry by the user, so users may have multiple entries on the same day.每行是用户的不同日志条目,因此用户可能在同一天有多个条目。
USER DATE
1 user1 2021-04-15
4 user5 2021-04-15
5 user3 2021-04-15
6 user1 2021-04-15
14 user2 2021-04-16
... ... ...
2227 user4 2021-12-30
2228 user5 2021-12-30
2229 user3 2021-12-30
2230 user2 2021-12-30
2231 user1 2021-12-30
I would like to get something like this我想得到这样的东西
MONTH USER CUMSUM
1 2021-04 user1 3
2 2021-04 user2 5
3 2021-04 user3 2
4 2021-04 user4 0
5 2021-04 user5 1
... ... ... ...
n 2021-12 user1 232
n+1 2021-12 user2 124
n+2 2021-12 user3 152
n+3 2021-12 user4 312
n+4 2021-12 user5 218
The objective is to later graph the cumulative sum by month for each user.目标是稍后按月绘制每个用户的累积总和。 I have a code that is already working but had to iterate on the df and count each entry for each month on a dict.我有一个已经在工作的代码,但必须在 df 上进行迭代,并在 dict 上计算每个月的每个条目。 Probably not the most efficient way.可能不是最有效的方法。 I tried using cumsum and groupby but so far without success.我尝试使用 cumsum 和 groupby 但到目前为止没有成功。
You can use pandas Grouper
or more typically written pd.Grouper
for the month, but you have to set an index if you don't already have one.您可以使用pandas Grouper
或更常见的写为当月的pd.Grouper
,但如果您还没有索引,则必须设置索引。
df.set_index('DATE').groupby([pd.Grouper(freq = 'M'),'USER']).sum()
df['MONTH'] = pd.to_datetime(df['DATE'], format='%Y-%m-%d') # getting DATE to datetime
df['MONTH'] = df['MONTH'].apply(lambda x: x.strftime("%Y-%m")) # applying your format
df['count'] = 1 # adding a count column for cumsum()
df_try = df.groupby(['USER', 'MONTH']).sum().groupby(level=0).cumsum() # groupby and cumsum
how about this one-liner:这个单线怎么样:
df.groupby([pd.Grouper(key='DATE', freq='M'), 'USER'])['USER'].count().groupby(['USER']).cumsum()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.