How to calculate cumulative count of unique elements in pandas

Question

I have a dataframe with a binary column that is indicative of inactive customers (0 = active, 1 = inactive), similar to the following:

month   customer_id inactive
2020-01 customer_1  0
2020-01 customer_2  0
2020-01 customer_3  0
2020-01 customer_3  0
2020-02 customer_1  0
2020-02 customer_1  0
2020-02 customer_2  0
2020-02 customer_2  0
2020-03 customer_2  1
2020-03 customer_3  1
2020-03 customer_4  0
2020-03 customer_4  0
2020-04 customer_1  0
2020-04 customer_1  1
2020-04 customer_4  0
2020-04 customer_5  0

To get a better way to view the total active customers, I want to make a cumulative count of unique customers per month that also subtracts the customers that have turned inactive. I am looking for an output that looks like this:

month   cum_count_unique_customers
2020-01 3
2020-02 3
2020-03 2
2020-04 3

Is there a way to get that result using Pandas?

Thanks for the help!

Answer 1

maybe you could try this:

import pandas as pd
df = pd.DataFrame({'month':['2020-01','2020-01','2020-01','2020-02','2020-02'],
                  'customer_id':['customer_1','customer_2','customer_1',
'customer_1','customer_2']})

df[df.inactive == 0].groupby('month')['customer_id'].nunique()

first the dataframe is grouped by month, then the number of unique customer ids are counted

Answer 2

Not the most elegant solution, but it may help if you are familiar with SQL:

df = df[df['inactive']!=0]
df = df[['month', 'customer_id']].groupby(['month']).count()
df.rename(columns={"customer_id": "cum_count_unique_customers"})

Where df is a pandas dataframe.

How to calculate cumulative count of unique elements in pandas

Question

2 answers

solution1
0 2021-07-27 14:59:53

solution2
0 2021-07-27 15:07:19

How to calculate cumulative count of unique elements in pandas

Question

2 answers

solution1 0 2021-07-27 14:59:53

solution2 0 2021-07-27 15:07:19

solution1
0 2021-07-27 14:59:53

solution2
0 2021-07-27 15:07:19