简体   繁体   中英

pandas groupby and find most frequent value (mode)

I have a datframe that looks like this

user_id product_id  created_at
    1   100         2019-04-21 20:20:00
    1   100         2019-04-23 00:10:00
    1   200         2019-05-24 10:00:00
    1   200         2020-06-24 10:10:24
    2   100         2019-01-22 21:10:00
    2   200         2019-04-25 20:23:30
    2   300         2021-01-21 10:20:00
    3   400         2019-12-21 10:20:00
    3   400         2021-04-21 10:20:00

I'm trying to find the most bought product_id for each user_id . I know I can use groupby along with value_counts or pd.Series.mode to get this. something like this:

df.groupby(['user_id'])['product_id',].apply(lambda x: x.value_counts().index[0]).reset_index()

This works fine until I reach a user_id having two or more product_id with the same count( like in the sample dataframe). When this happens, ideally I'd want to select the product_id with the most recent created_at . So for eg. for user_id 1, I'd want to select product_id 200 as it was last bought more recently.

What would be the most optimal way to achieve this?

You can calculate both count and max on dates, then sort on these values and drop duplicates (or use groupby().head()):

s = df.groupby(['user_id','product_id'])['created_at'].agg(['count','max'])
s.sort_values(['count','max'], ascending=False).groupby('user_id').head(1)

Output:

                    count                  max
user_id product_id                            
3       400             2  2021-04-21 10:20:00
1       200             2  2020-06-24 10:10:24
2       300             1  2021-01-21 10:20:00
df.\
sort_values("created_at", ascending=False).\
groupby(["user_id", "product_id"], sort=False, as_index=False).\
count().\
groupby("user_id", as_index=False).\
head(1)
#   user_id  product_id  created_at
# 0       1         200           2
# 1       2         300           1
# 2       3         400           2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM