简体   繁体   中英

Python Pandas Counts using rolling time window

I have a dataframe which looks like this

customerId Date         Amount_Spent
123        01/01/2018   500
456        01/01/2018   250
123        02/01/2018   300
456        02/01/2018   100

I want to count customers (distinct/non-distinct) who have spent more than 200 on two consecutive days.

So I expect to get

customerId Date1        Date2         Total_Amount_Spent
123        01/01/2018   02/01/2018    800

Can someone help me with this?

There is two check , one check the days diff, and another is check the amount always more than 100 which using all , then both situation satisfied we select the ID.

s=df.groupby('customerId').agg({'Date':lambda x : (x.iloc[0]-x.iloc[-1]).days==-1,'Amount_Spent':lambda x : (x>100).all()}).all(1)
newdf=df.loc[df.customerId.isin(s.index),]
newdf
Out[1242]:
   customerId       Date  Amount_Spent
0         123 2018-01-01           500
2         123 2018-01-02           300

Using groupby + agg again to get the format you need

newdf.groupby('customerId').agg({'Date':['first','last'],'Amount_Spent':'sum'})
Out[1244]: 
                 Date            Amount_Spent
                first       last          sum
customerId                                   
123        2018-01-01 2018-01-02          800

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM