简体   繁体   中英

How to calculate cumulative count of unique elements in pandas

I have a dataframe with a binary column that is indicative of inactive customers (0 = active, 1 = inactive), similar to the following:

month   customer_id inactive
2020-01 customer_1  0
2020-01 customer_2  0
2020-01 customer_3  0
2020-01 customer_3  0
2020-02 customer_1  0
2020-02 customer_1  0
2020-02 customer_2  0
2020-02 customer_2  0
2020-03 customer_2  1
2020-03 customer_3  1
2020-03 customer_4  0
2020-03 customer_4  0
2020-04 customer_1  0
2020-04 customer_1  1
2020-04 customer_4  0
2020-04 customer_5  0

To get a better way to view the total active customers, I want to make a cumulative count of unique customers per month that also subtracts the customers that have turned inactive. I am looking for an output that looks like this:

month   cum_count_unique_customers
2020-01 3
2020-02 3
2020-03 2
2020-04 3

Is there a way to get that result using Pandas?

Thanks for the help!

maybe you could try this:

import pandas as pd
df = pd.DataFrame({'month':['2020-01','2020-01','2020-01','2020-02','2020-02'],
                  'customer_id':['customer_1','customer_2','customer_1',
'customer_1','customer_2']})

df[df.inactive == 0].groupby('month')['customer_id'].nunique()

first the dataframe is grouped by month, then the number of unique customer ids are counted

Not the most elegant solution, but it may help if you are familiar with SQL:

df = df[df['inactive']!=0]
df = df[['month', 'customer_id']].groupby(['month']).count()
df.rename(columns={"customer_id": "cum_count_unique_customers"})

Where df is a pandas dataframe.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM