简体   繁体   中英

Pandas Finding cross sell in two columns in a data frame

What I'm trying to do is a kind of a cross sell.

I have a Pandas dataframe with two columns, one with receipt numbers, and the other with product ids:

receipt  product
1        a
1        b
2        c
3        b
3        a

Most of the receipts have many products. What I need to find is the count of combinations of products that happen in the receipts. Let's say products 'a' and 'b' are the most common combination (they appear together in most of the receipts), how do I find this information?

I tried using df.groupby(['receipt','product']).count() but this only brings me the count of combinations for receipt + product, not the count of relation of products per receipt.

Any help is aprecciated, and thanks!

I think this is what you looking for

df.groupby(['receipt']).agg({'product': list}).assign(count=lambda x: x['product'].str.len())

        product  count
receipt
1        [a, b]      2
2           [c]      1
3        [b, a]      2

I think you can do a cross merge:

new_df = df.merge(df, on='receipt')
(new_df[new_df['product_x'] < new_df['product_y']]
     .groupby(['product_x','product_y'])['receipt'].count()
)

Output:

product_x  product_y
a          b            2
Name: receipt, dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM