I need to sort my df by month with the cumulative sum for each user (about 5 or 6). Each line is a different log entry by the user, so users may have multiple entries on the same day.
USER DATE
1 user1 2021-04-15
4 user5 2021-04-15
5 user3 2021-04-15
6 user1 2021-04-15
14 user2 2021-04-16
... ... ...
2227 user4 2021-12-30
2228 user5 2021-12-30
2229 user3 2021-12-30
2230 user2 2021-12-30
2231 user1 2021-12-30
I would like to get something like this
MONTH USER CUMSUM
1 2021-04 user1 3
2 2021-04 user2 5
3 2021-04 user3 2
4 2021-04 user4 0
5 2021-04 user5 1
... ... ... ...
n 2021-12 user1 232
n+1 2021-12 user2 124
n+2 2021-12 user3 152
n+3 2021-12 user4 312
n+4 2021-12 user5 218
The objective is to later graph the cumulative sum by month for each user. I have a code that is already working but had to iterate on the df and count each entry for each month on a dict. Probably not the most efficient way. I tried using cumsum and groupby but so far without success.
You can use pandas Grouper
or more typically written pd.Grouper
for the month, but you have to set an index if you don't already have one.
df.set_index('DATE').groupby([pd.Grouper(freq = 'M'),'USER']).sum()
df['MONTH'] = pd.to_datetime(df['DATE'], format='%Y-%m-%d') # getting DATE to datetime
df['MONTH'] = df['MONTH'].apply(lambda x: x.strftime("%Y-%m")) # applying your format
df['count'] = 1 # adding a count column for cumsum()
df_try = df.groupby(['USER', 'MONTH']).sum().groupby(level=0).cumsum() # groupby and cumsum
how about this one-liner:
df.groupby([pd.Grouper(key='DATE', freq='M'), 'USER'])['USER'].count().groupby(['USER']).cumsum()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.