Efficient algorithms to perform Market Basket Analysis

Question

I want to perform Market Basket Analysis (or Association Analysis) on retail ecommerce dataset.

The problem I am facing is the huge data size of 3.3 million transactions in a single month. I cannot cut down the transactions as I may miss some products. Provided below the structure of the data:

Order_ID = Unique transaction identifier

Customer_ID = Identifier of the customer who placed the order

Product_ID = List of all the products the customer has purchased

Date = Date on which the sale has happened

When I feed this data to the #apriori algorithm in Python, my system cannot handle the huge memory requirements to run. It can run with just 100K transactions. I have 16gb RAM.

Any help in suggesting a better (and faster) algorithm is much appreciated.

I can use SQL as well to sort out data size issues, but I will get only 1 Antecedent --> 1 Consequent rule. Is there a way to get multiset rules such as {A,B,C} --> {D,E} ie, If a customer purchases products A, B and C, then there is a high chance to purchase products D and E.

Answer 1

For a huge data size try FP Growth , as it is an improvement to the Apriori method. It also only loop data twice when compared to Apriori.

from mlxtend.frequent_patterns import fpgrowth

Then just change:

apriori(df, min_support=0.6)

To

fpgrowth(df, min_support=0.6)

There also an research that compare each algorithm, for memory issue I recommend: Evaluation of Apriori, FP growth and Eclat association rule miningalgorithms or Comparing the Performance of Frequent Pattern Mining Algorithms .

Efficient algorithms to perform Market Basket Analysis

Question

1 answers

solution1
0 2022-09-13 07:47:29

Efficient algorithms to perform Market Basket Analysis

Question

1 answers

solution1 0 2022-09-13 07:47:29

solution1
0 2022-09-13 07:47:29