简体   繁体   English

Pandas:如何计算一个列(按日期分组)上的滚动 window 并计算另一列的不同值?

[英]Pandas: how to calculate a rolling window over one column (grouped by date) and count distinct values of another column?

I am trying to calculate in Pandas a rolling window over one date column and count the distinct values in another column.我试图在 Pandas 中计算一个日期列上的滚动 window 并计算另一列中的不同值。 Let's say I have this df dataframe:假设我有这个df dataframe:

date    customer
2020-01-01  A
2020-01-02  A
2020-01-02  B
2020-01-03  A
2020-01-03  C
2020-01-03  D
2020-01-04  E

I would like to group by the date column, create a rolling window of two days and count the distinct values in the column customer .我想按date列分组,创建两天的滚动 window 并计算列customer中的不同值。 The expected output would be something like:预期的 output 将类似于:

date       distinct_customers
2020-01-01  NaN --> (first value)
2020-01-02  2.0 --> (distinct customers between 2020-01-01 and 2020-01-02: [A, B]) 
2020-01-03  4.0 --> (distinct customers between 2020-01-02 and 2020-01-03: [A, B, C, D])
2020-01-04  4.0 --> (distinct customers between 2020-01-03 and 2020-01-04: [A, C, D, E])

It seems easy but I don't seem to find any straight-forward way to achieve that, I've tried using groupby or rolling .这似乎很容易,但我似乎没有找到任何直接的方法来实现这一点,我尝试过使用groupbyrolling I don't find other posts solving this issue.我没有找到解决此问题的其他帖子。 Does someone have any idea how to do this?有人知道如何做到这一点吗? Thanks a lot in advance!提前非常感谢!

Based on the idea of @Musulmon, this one liner should do it:基于@Musulmon 的想法,这个班轮应该这样做:

pd.crosstab(df['date'], df['customer']).rolling(2).sum().clip(0,1).sum(axis=1)

Thanks!谢谢!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 计算 Pandas 列中值的 3 个月滚动计数 - Calculate 3 Month Rolling Count of values in a column Pandas 如何计算 Pandas 中另一列分组的平均值 - How to calculate mean values grouped on another column in Pandas 如何在一列中执行 groupby 并计算 pandas 中每一组中另一列的不同值 - How to perform groupby in one column and count distinct values of another column in each group in pandas 添加列,保留按 pandas 中的变量分组的不同值的计数 - Add column that keeps count of distinct values grouped by a variable in pandas 计算按列分组的 Pandas 滚动值 - Calculating Pandas rolling values grouped by a column 如何根据来自另一列的滚动函数的结果计算pandas DataFrame列的值 - How to calculate the values of a pandas DataFrame column depending on the results of a rolling function from another column 如何对一个字段进行不同计数,在 Pandas 中按另一个字段分组 - How to do a distinct count of one field, grouped by another in Pandas 计算熊猫列上的滚动窗口加权平均值 - Calculate a rolling window weighted average on a Pandas column 如何计算 dataframe 中按另一列的列值分组的一列的连续字符串值? - How to count consecutive string values of one column grouped by column values of another in a dataframe? 基于另一个数据框将值从一列滚动到另一列 - Rolling over values from one column to other based on another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM