Having the transaction data for all customers for the last 10 years, I have a dataframe df:
Customer_ID | date | year | Dollars
ABC 2017-02-07 2017 456
ABC 2017-03-05 2017 167
ABC 2017-07-13 2017 345
ABC 2017-05-15 2017 406
ABC 2016-12-13 2016 320
ABC 2016-01-03 2016 305
ABC 2016-10-10 2016 456
ABC 2016-05-10 2016 175
ABC 2015-04-07 2015 145
BCD 2017-09-08 2017 155
BCD 2016-10-22 2016 274
BCD 2016-10-19 2016 255
I would like to add a flag, when a customer has their 4th visit in a year for the first time.
So this would be the output:
Customer_ID | date | year | Dollars | Flag
ABC 2017-02-07 2017 456
ABC 2017-03-05 2017 167
ABC 2017-07-13 2017 345
ABC 2017-05-15 2017 406
ABC 2016-12-13 2016 320 X
ABC 2016-01-03 2016 305
ABC 2016-10-10 2016 456
ABC 2016-05-10 2016 175
ABC 2015-04-07 2015 145
BCD 2017-09-08 2017 155
BCD 2016-10-22 2016 274
BCD 2016-10-19 2016 255
I was going to do something this way, but it is not generating the output needed and I don't know how to flag the first time they have a 4th visit.
df ['Flag'] = np.where(df[['Customer_ID']].groupby(['year']).agg(['count'])>3, 'X','0')
Then, You can try this ,I am using cumcount
(Ps: you can drop the columns by df.drop(['Count','Count2'],axis=1)
):
df['Count']=df.sort_values('date').groupby(['Customer_ID','year']).cumcount()
df['Count2']=df.sort_values('date').groupby(['Customer_ID','Count']).cumcount()
df['Flag']=np.where(((df['Count']==3) & (df['Count2']==0)),'X', ' ')
Customer_ID date year Dollars Count Count2 Flag
0 ABC 2017-02-07 2017 456 0 2
1 ABC 2017-03-05 2017 167 1 1
2 ABC 2017-07-13 2017 345 3 1
3 ABC 2017-05-15 2017 406 2 1
4 ABC 2016-12-13 2016 320 3 0 X
5 ABC 2016-01-03 2016 305 0 1
6 ABC 2016-10-10 2016 456 2 0
7 ABC 2016-05-10 2016 175 1 0
8 ABC 2015-04-07 2015 145 0 0
9 BCD 2017-09-08 2017 155 0 1
10 BCD 2016-10-22 2016 274 1 0
11 BCD 2016-10-19 2016 255 0 0
Here ya go!
df['Flag'] = np.where(df.groupby(['Customer_ID','year']).cumcount() + 1 == 4, 'X','') df['Flag'] = np.where((df.groupby(['Customer_ID','Flag']).cumcount() == 0) & (df['Flag'] == 'X'), 'X','')
.
Edited for question misunderstanding, thanks @Wen. Here the last line drops duplicate X
's for a customer, so that only the first time a customer makes 4 purchases within a year gets marked.
When you do df[['Customer_ID']]
, this creates a DataFrame object with only one column, named Customer_ID
. Therefore when you try to extract a column named year
, you get a KeyError
, because that column doesn't exist. Another issue is that applying groupby()
and then agg()
returns a dataframe, which is not what you want.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.