简体   繁体   English

如何计算熊猫中唯一元素的累积计数

[英]How to calculate cumulative count of unique elements in pandas

I have a dataframe with a binary column that is indicative of inactive customers (0 = active, 1 = inactive), similar to the following:我有一个带有二进制列的数据框,表示不活跃的客户(0 = 活跃,1 = 不活跃),类似于以下内容:

month   customer_id inactive
2020-01 customer_1  0
2020-01 customer_2  0
2020-01 customer_3  0
2020-01 customer_3  0
2020-02 customer_1  0
2020-02 customer_1  0
2020-02 customer_2  0
2020-02 customer_2  0
2020-03 customer_2  1
2020-03 customer_3  1
2020-03 customer_4  0
2020-03 customer_4  0
2020-04 customer_1  0
2020-04 customer_1  1
2020-04 customer_4  0
2020-04 customer_5  0

To get a better way to view the total active customers, I want to make a cumulative count of unique customers per month that also subtracts the customers that have turned inactive.为了更好地查看活跃客户总数,我想计算每月唯一客户的累计计数,同时减去已变为非活跃客户的数量。 I am looking for an output that looks like this:我正在寻找如下所示的输出:

month   cum_count_unique_customers
2020-01 3
2020-02 3
2020-03 2
2020-04 3

Is there a way to get that result using Pandas?有没有办法使用 Pandas 获得该结果?

Thanks for the help!谢谢您的帮助!

maybe you could try this:也许你可以试试这个:

import pandas as pd
df = pd.DataFrame({'month':['2020-01','2020-01','2020-01','2020-02','2020-02'],
                  'customer_id':['customer_1','customer_2','customer_1',
'customer_1','customer_2']})

df[df.inactive == 0].groupby('month')['customer_id'].nunique()

first the dataframe is grouped by month, then the number of unique customer ids are counted首先数据帧按月分组,然后计算唯一客户 ID 的数量

Not the most elegant solution, but it may help if you are familiar with SQL:不是最优雅的解决方案,但如果您熟悉 SQL,它可能会有所帮助:

df = df[df['inactive']!=0]
df = df[['month', 'customer_id']].groupby(['month']).count()
df.rename(columns={"customer_id": "cum_count_unique_customers"})

Where df is a pandas dataframe.其中df是熊猫数据框。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM