Pandas：如何计算一个列（按日期分组）上的滚动 window 并计算另一列的不同值？

Question

I am trying to calculate in Pandas a rolling window over one date column and count the distinct values in another column.我试图在 Pandas 中计算一个日期列上的滚动 window 并计算另一列中的不同值。 Let's say I have this df dataframe:假设我有这个df dataframe：

date    customer
2020-01-01  A
2020-01-02  A
2020-01-02  B
2020-01-03  A
2020-01-03  C
2020-01-03  D
2020-01-04  E

I would like to group by the date column, create a rolling window of two days and count the distinct values in the column customer .我想按date列分组，创建两天的滚动 window 并计算列customer中的不同值。 The expected output would be something like:预期的 output 将类似于：

date       distinct_customers
2020-01-01  NaN --> (first value)
2020-01-02  2.0 --> (distinct customers between 2020-01-01 and 2020-01-02: [A, B]) 
2020-01-03  4.0 --> (distinct customers between 2020-01-02 and 2020-01-03: [A, B, C, D])
2020-01-04  4.0 --> (distinct customers between 2020-01-03 and 2020-01-04: [A, C, D, E])

It seems easy but I don't seem to find any straight-forward way to achieve that, I've tried using groupby or rolling .这似乎很容易，但我似乎没有找到任何直接的方法来实现这一点，我尝试过使用groupby或rolling 。 I don't find other posts solving this issue.我没有找到解决此问题的其他帖子。 Does someone have any idea how to do this?有人知道如何做到这一点吗？ Thanks a lot in advance!提前非常感谢！

Answer 1

Based on the idea of @Musulmon, this one liner should do it:基于@Musulmon 的想法，这个班轮应该这样做：

pd.crosstab(df['date'], df['customer']).rolling(2).sum().clip(0,1).sum(axis=1)

Thanks!谢谢！

Pandas：如何计算一个列（按日期分组）上的滚动 window 并计算另一列的不同值？

问题描述

1 个解决方案

解决方案1
1 2020-07-15 15:53:35

Pandas：如何计算一个列（按日期分组）上的滚动 window 并计算另一列的不同值？

问题描述

1 个解决方案

解决方案1 1 2020-07-15 15:53:35

解决方案1
1 2020-07-15 15:53:35