I have a pandas dataframe like following
buyer_id item_id order_id date
139 57 387 2015-12-28
140 9 388 2015-12-28
140 57 389 2015-12-28
36 9 390 2015-12-28
64 49 404 2015-12-29
146 49 405 2015-12-29
81 49 406 2015-12-29
140 80 407 2015-12-30
139 81 408 2015-12-30
There are lot of rows in above dataframe. What I am trying to achieve is, whether introducing new dishes driving my users to come back. item_id
is mapped to a dish name. What I want to see is if a specific user is ordering different dish on different day. eg buyer_id 140 has ordered two dishes item_id (9,57) on 28th Dec and same buyer has ordered different dish (item_id = 80) on 30th Dec
Then I want to flag this user as 1
How I am doing it in python is like this
item_wise_order.groupby(['date','buyer_id'])['item_id'].apply(lambda x:
x.tolist())
it gives me following output
date buyer_id
2015-12-28 139 [57]
140 [9,57]
36 [9]
2015-12-29 64 [49]
146 [49]
81 [49]
2015-12-30 140 [80]
139 [81]
Desired output
buyer_id item_id order_id date flag
139 57 387 2015-12-28 1
140 9 388 2015-12-28 1
140 57 389 2015-12-28 1
36 9 390 2015-12-28 0
64 49 404 2015-12-29 0
146 49 405 2015-12-29 0
81 49 406 2015-12-29 0
140 80 407 2015-12-30 1
139 81 408 2015-12-30 1
Similar to Anton's answer, but using apply
users = df.groupby('buyer_id').apply(lambda r: r['item_id'].unique().shape[0] > 1 and
r['date'].unique().shape[0] > 1 )*1
df.set_index('buyer_id', inplace=True)
df['good_user'] = users
result:
item_id order_id date good_user
buyer_id
139 57 387 2015-12-28 1
140 9 388 2015-12-28 1
140 57 389 2015-12-28 1
36 9 390 2015-12-28 0
64 49 404 2015-12-29 0
146 49 405 2015-12-29 0
81 49 406 2015-12-29 0
140 80 407 2015-12-30 1
139 81 408 2015-12-30 1
EDIT because I thought of another case: suppose the data shows a buyer buys the same two (or more) goods on two different days. Should this user be flagged as 1 or 0? Because effectively, he/she does not actually choose anything different on the second date. So take buyer 81 in the following table. You see they only buy 49 and 50 on both dates.
buyer_id item_id order_id date
139 57 387 2015-12-28
140 9 388 2015-12-28
140 57 389 2015-12-28
36 9 390 2015-12-28
64 49 404 2015-12-29
146 49 405 2015-12-29
81 49 406 2015-12-29
140 80 407 2015-12-30
139 81 408 2015-12-30
81 50 406 2015-12-29
81 49 999 2015-12-30
81 50 999 2015-12-30
To accomodate this, here's what I came up with (kinda ugly but should work)
# this function is applied to all buyers
def find_good_buyers(buyer):
# which dates the buyer has made a purchase
buyer_dates = buyer.groupby('date')
# a string representing the unique items purchased at each date
items_on_date = buyer_dates.agg({'item_id': lambda x: '-'.join(x.unique())})
# if there is more than 1 combination of item_id, then it means that
# the buyer has purchased different things in different dates
# so this buyer must be flagged to 1
good_buyer = (len(items_on_date.groupby('item_id').groups) > 1) * 1
return good_buyer
df['item_id'] = df['item_id'].astype('S')
buyers = df.groupby('buyer_id')
good_buyer = buyers.apply(find_good_buyers)
df.set_index('buyer_id', inplace=True)
df['good_buyer'] = good_buyer
df.reset_index(inplace=True)
This works on buyer 81 setting it to 0 because once you group by date, both dates at which a purchase was made will have the same "49-50" combination of items purchased, hence the number of combinations = 1 and the buyer will be flagged 0.
You could groupby by buyer_id
, then aggregate column with np.unique
. Then you'll get np.ndarrays
for rows where you have several dates and item_ids. You could find that rows with isinstance
of np.ndarray
and you'll get bool series which you could pass to aggregated dataframe and find interested buyer. By filtering original dataframe with obtained buyers
you could fill rows for flag
with loc
:
df_agg = df.groupby('buyer_id')[['date', 'item_id']].agg(np.unique)
df_agg = df_agg.applymap(lambda x: isinstance(x, np.ndarray))
buyers = df_agg[(df_agg['date']) & (df_agg['item_id'])].index
mask = df['buyer_id'].isin(buyers)
df['flag'] = 0
df.loc[mask, 'flag'] = 1
In [124]: df
Out[124]:
buyer_id item_id order_id date flag
0 139 57 387 2015-12-28 1
1 140 9 388 2015-12-28 1
2 140 57 389 2015-12-28 1
3 36 9 390 2015-12-28 0
4 64 49 404 2015-12-29 0
5 146 49 405 2015-12-29 0
6 81 49 406 2015-12-29 0
7 140 80 407 2015-12-30 1
8 139 81 408 2015-12-30 1
Output from first and second steps:
In [146]: df.groupby('buyer_id')[['date', 'item_id']].agg(np.unique)
Out[146]:
date item_id
buyer_id
36 2015-12-28 9
64 2015-12-29 49
81 2015-12-29 49
139 [2015-12-28, 2015-12-30] [57, 81]
140 [2015-12-28, 2015-12-30] [9, 57, 80]
146 2015-12-29 49
In [148]: df_agg.applymap(lambda x: isinstance(x, np.ndarray))
Out[148]:
date item_id
buyer_id
36 False False
64 False False
81 False False
139 True True
140 True True
146 False False
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.