简体   繁体   中英

Efficient algorithms to perform Market Basket Analysis

I want to perform Market Basket Analysis (or Association Analysis) on retail ecommerce dataset.

The problem I am facing is the huge data size of 3.3 million transactions in a single month. I cannot cut down the transactions as I may miss some products. Provided below the structure of the data:

Order_ID = Unique transaction identifier

Customer_ID = Identifier of the customer who placed the order

Product_ID = List of all the products the customer has purchased

Date = Date on which the sale has happened

When I feed this data to the #apriori algorithm in Python, my system cannot handle the huge memory requirements to run. It can run with just 100K transactions. I have 16gb RAM.

Any help in suggesting a better (and faster) algorithm is much appreciated.

I can use SQL as well to sort out data size issues, but I will get only 1 Antecedent --> 1 Consequent rule. Is there a way to get multiset rules such as {A,B,C} --> {D,E} ie, If a customer purchases products A, B and C, then there is a high chance to purchase products D and E.

For a huge data size try FP Growth , as it is an improvement to the Apriori method. It also only loop data twice when compared to Apriori.

from mlxtend.frequent_patterns import fpgrowth

Then just change:

apriori(df, min_support=0.6)

To

fpgrowth(df, min_support=0.6)

There also an research that compare each algorithm, for memory issue I recommend: Evaluation of Apriori, FP growth and Eclat association rule miningalgorithms or Comparing the Performance of Frequent Pattern Mining Algorithms .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM