I'm trying to achieve a cumulative count in a Pandas column that's a little tricky, where it only adds a count ONCE per date and ID number.
The visual below will help explain, my current dataset looks like this:
ID Date Mention_of_Yes
XDA 11/19/2019 0
XDA 12/19/2019 1
XDA 12/19/2019 1
XDA 1/19/2020 1
XDA 2/19/2020 0
XDA 3/19/2020 1
JJL 11/2/2019 1
JJL 11/2/2019 1
JJL 12/2/20019 0
JJL 1/20/2020 1
And I'm attempting to add a column thats counts in this specific way, only ONCE per "Mention's of Yes" on a specific date:
ID Date Mention_of_Yes *Correct CumCount
XDA 11/19/2019 0 0
XDA 12/19/2019 1 1
XDA 12/19/2019 1 1** Only Counts Once Per Date (12/19/2019 in this case)
XDA 1/19/2020 1 2
XDA 2/19/2020 0 2
XDA 3/19/2020 1 3
JJL 19/2/2019 0 0
JJL 10/2/2019 0 0
JJL 11/2/2019 1 1
JJL 11/2/2019 1 1** Only Counts Once Per Date (11/2/2019 in this case)
JJL 12/2/20019 0 1
JJL 1/20/2020 1 2
I've tried different iterations of groupby and cumcount, but can't seem to get the configuration right, like with the code I've used below:
df['Correct_CumCount'] = df.groupby[('ID','Mention_of_Yes')].cumcount()+1
Any help would be greatly appreciated!
you can do it with groupby
and cumsum
(not cumcount
) after drop_duplicates
, then ffill
like:
df['Correct_CumCount'] = df.drop_duplicates(subset=['ID', 'Date', 'Mention_of_Yes'],
keep='first')\
.groupby('ID')['Mention_of_Yes'].cumsum()
df['Correct_CumCount'] = df['Correct_CumCount'].ffill().astype(int)
print (df)
ID Date Mention_of_Yes Correct_CumCount
0 XDA 11/19/2019 0 0
1 XDA 12/19/2019 1 1
2 XDA 12/19/2019 1 1
3 XDA 1/19/2020 1 2
4 XDA 2/19/2020 0 2
5 XDA 3/19/2020 1 3
6 JJL 11/2/2019 1 1
7 JJL 11/2/2019 1 1
8 JJL 12/2/20019 0 1
9 JJL 1/20/2020 1 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.