使用滾動時間窗口的Python Pandas計數

Question

我有一個看起來像這樣的數據框

customerId Date         Amount_Spent
123        01/01/2018   500
456        01/01/2018   250
123        02/01/2018   300
456        02/01/2018   100

我想計算連續兩天花費超過200的客戶（與眾不同）。

所以我希望得到

customerId Date1        Date2         Total_Amount_Spent
123        01/01/2018   02/01/2018    800

有人可以幫我弄這個嗎？

Answer 1

有兩個檢查，一張支票的日子差異，另一個是檢查總是大於100，其使用量all ，那么這兩個情況中滿足我們選擇ID。

s=df.groupby('customerId').agg({'Date':lambda x : (x.iloc[0]-x.iloc[-1]).days==-1,'Amount_Spent':lambda x : (x>100).all()}).all(1)
newdf=df.loc[df.customerId.isin(s.index),]
newdf
Out[1242]:
   customerId       Date  Amount_Spent
0         123 2018-01-01           500
2         123 2018-01-02           300

再次使用groupby + agg獲得所需的格式

newdf.groupby('customerId').agg({'Date':['first','last'],'Amount_Spent':'sum'})
Out[1244]: 
                 Date            Amount_Spent
                first       last          sum
customerId                                   
123        2018-01-01 2018-01-02          800

使用滾動時間窗口的Python Pandas計數

問題描述

1 個解決方案

解決方案1
2 已采納 2018-12-10 14:53:11

使用滾動時間窗口的Python Pandas計數

問題描述

1 個解決方案

解決方案1 2 已采納 2018-12-10 14:53:11

解決方案1
2 已采納 2018-12-10 14:53:11