简体   繁体   中英

shopping basket analysis in python with pandas

I have a dataframe containing transaction data. Each row represents one transaction and the columns indicate whether a product has been bought from a category (categories are AF) or not (one = yes, zero = no). Now I would like to compute the pairs of transactions within each category. My dataframe looks as follows:

A  B  C  D  E  F  
1  1  0  0  0  0   
1  0  1  1  0  0 

The output should be a matrix counting each pairs of the categories in the dataframe like so:

  A B C D E F
A 4 2 1 0 4 2
B 5 6 7 3 5 1
C 1 6 5 8 7 9
D ...
E ...
F ...

Anyone knows a solution on how to solve this?

Thank you very much!

Use the dot product with its transpose:

df.T.dot(df)
Out: 
   A  B  C  D  E  F
A  2  1  1  1  0  0
B  1  1  0  0  0  0
C  1  0  1  1  0  0
D  1  0  1  1  0  0
E  0  0  0  0  0  0
F  0  0  0  0  0  0

Note that looking for pairwise occurrences is not scalable though. You might want to look at apriori algorithm .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM