Pandas Finding cross sell in two columns in a data frame

Question

What I'm trying to do is a kind of a cross sell.

I have a Pandas dataframe with two columns, one with receipt numbers, and the other with product ids:

receipt  product
1        a
1        b
2        c
3        b
3        a

Most of the receipts have many products. What I need to find is the count of combinations of products that happen in the receipts. Let's say products 'a' and 'b' are the most common combination (they appear together in most of the receipts), how do I find this information?

I tried using df.groupby(['receipt','product']).count() but this only brings me the count of combinations for receipt + product, not the count of relation of products per receipt.

Any help is aprecciated, and thanks!

Answer 1

I think this is what you looking for

df.groupby(['receipt']).agg({'product': list}).assign(count=lambda x: x['product'].str.len())

        product  count
receipt
1        [a, b]      2
2           [c]      1
3        [b, a]      2

Answer 2

I think you can do a cross merge:

new_df = df.merge(df, on='receipt')
(new_df[new_df['product_x'] < new_df['product_y']]
     .groupby(['product_x','product_y'])['receipt'].count()
)

Output:

product_x  product_y
a          b            2
Name: receipt, dtype: int64

Pandas Finding cross sell in two columns in a data frame

Question

2 answers

solution1
2 2020-02-07 02:04:27

solution2
1 ACCPTED 2020-02-07 02:11:41

Pandas Finding cross sell in two columns in a data frame

Question

2 answers

solution1 2 2020-02-07 02:04:27

solution2 1 ACCPTED 2020-02-07 02:11:41

solution1
2 2020-02-07 02:04:27

solution2
1 ACCPTED 2020-02-07 02:11:41