I have a dataframe which looks like this
customerId Date Amount_Spent
123 01/01/2018 500
456 01/01/2018 250
123 02/01/2018 300
456 02/01/2018 100
I want to count customers (distinct/non-distinct) who have spent more than 200 on two consecutive days.
So I expect to get
customerId Date1 Date2 Total_Amount_Spent
123 01/01/2018 02/01/2018 800
Can someone help me with this?
There is two check , one check the days diff, and another is check the amount always more than 100 which using all
, then both situation satisfied we select the ID.
s=df.groupby('customerId').agg({'Date':lambda x : (x.iloc[0]-x.iloc[-1]).days==-1,'Amount_Spent':lambda x : (x>100).all()}).all(1)
newdf=df.loc[df.customerId.isin(s.index),]
newdf
Out[1242]:
customerId Date Amount_Spent
0 123 2018-01-01 500
2 123 2018-01-02 300
Using groupby
+ agg
again to get the format you need
newdf.groupby('customerId').agg({'Date':['first','last'],'Amount_Spent':'sum'})
Out[1244]:
Date Amount_Spent
first last sum
customerId
123 2018-01-01 2018-01-02 800
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.