I have a dataframe containing transaction data. Each row represents one transaction and the columns indicate whether a product has been bought from a category (categories are AF) or not (one = yes, zero = no). Now I would like to compute the pairs of transactions within each category. My dataframe looks as follows:
A B C D E F
1 1 0 0 0 0
1 0 1 1 0 0
The output should be a matrix counting each pairs of the categories in the dataframe like so:
A B C D E F
A 4 2 1 0 4 2
B 5 6 7 3 5 1
C 1 6 5 8 7 9
D ...
E ...
F ...
Anyone knows a solution on how to solve this?
Thank you very much!
Use the dot product with its transpose:
df.T.dot(df)
Out:
A B C D E F
A 2 1 1 1 0 0
B 1 1 0 0 0 0
C 1 0 1 1 0 0
D 1 0 1 1 0 0
E 0 0 0 0 0 0
F 0 0 0 0 0 0
Note that looking for pairwise occurrences is not scalable though. You might want to look at apriori algorithm .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.