Pandas - 不同值的滚动累积计数

Question

I have a df like so:我有一个像这样的 df：

df = pd.DataFrame({
         'date': ['01/01/2020', '01/01/2020', '01/01/2020', '02/01/2020', '02/01/2020', '03/01/2020', '03/01/2020'],
         'id': [101, 102, 103, 101, 104, 105, 106]
})

Output:输出：

         date   id
0  01/01/2020  101
1  01/01/2020  102
2  01/01/2020  103
3  02/01/2020  101
4  02/01/2020  104
5  03/01/2020  105
6  03/01/2020  106

I require a cumulative count of the distinct values like so:我需要像这样的不同值的累积计数：

        date   id
0  01/01/2020  3
1  02/01/2020  4
2  03/01/2020  6

I have tried things like df.groupby(['date']).nunique() but obviously that's not right as it gives the unique count for each date, it doesn't have a rolling unique count as I require.我尝试过 df.groupby(['date']).nunique() 之类的东西，但显然这是不对的，因为它给出了每个日期的唯一计数，它没有我需要的滚动唯一计数。

Answer 1

I believe is necesary first remove duplicates per id by DataFrame.drop_duplicates , then get counts per date s by GroupBy.size and add cumulative sum by Series.cumsum :我相信这是每necesary先删除重复的id通过DataFrame.drop_duplicates ，然后得到每计数date由s GroupBy.size并添加累积和Series.cumsum ：

df = df.drop_duplicates('id').groupby('date').size().cumsum().reset_index(name='id')
print (df)
         date  id
0  01/01/2020   3
1  02/01/2020   4
2  03/01/2020   6

Answer 2

or we can use DataFrame.duplicated :或者我们可以使用DataFrame.duplicated ：

(~df.duplicated('id')).groupby(df['date']).sum().cumsum().rename('id').reset_index()

         date   id
0  01/01/2020  3.0
1  02/01/2020  4.0
2  03/01/2020  6.0

Pandas - 不同值的滚动累积计数

问题描述

2 个解决方案

解决方案1
3 已采纳 2020-02-06 12:22:30

解决方案2
2 2020-02-06 12:41:44

Pandas - 不同值的滚动累积计数

问题描述

2 个解决方案

解决方案1 3 已采纳 2020-02-06 12:22:30

解决方案2 2 2020-02-06 12:41:44

解决方案1
3 已采纳 2020-02-06 12:22:30

解决方案2
2 2020-02-06 12:41:44