简体   繁体   中英

Apriori Results in Python

I am trying to run an apriori algorithm in python. My specific problem is when I use the apriori function, I specify the min_length as 2. However, when I print the rules, I get rules that contain only 1 item. I am wondering why apriori does not filter out items less than 2, because I specified I only want rules with 2 things in the itemset.

from apyori import apriori
#store the transactions
transactions = []
total_transactions = 0
with open('browsing.txt', 'r') as file:
    for transaction in file:
        total_transactions += 1
        items = []
        for item in transaction.split():
            items.append(item)
        transactions.append(items)
#
support_threshold = (100/total_transactions)
print(support_threshold)

minimum_support = 100
frequent_items = apriori(transactions, min_length = 2, min_support = support_threshold)
association_results = list(frequent_items)

print(association_results[0])
print(association_results[1])

My results:

RelationRecord(items=frozenset({'DAI11223'}), support=0.004983762579981351, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'DAI11223'}), confidence=0.004983762579981351, lift=1.0)])
RelationRecord(items=frozenset({'DAI11778'}), support=0.0037619369152117293, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'DAI11778'}), confidence=0.0037619369152117293, lift=1.0)])

A look into the code from here: https://github.com/ymoch/apyori/blob/master/apyori.py revealed that there is no min_length keyword (only max_length). They way apyori is implemented it does not raise any warning or error when passing keyword arguments which are not used.

Why not filter the result afterwards?

association_results = filter(lambda x: len(x.items) > 1, association_results)

Limitation of first approach was need to converted data in a list fomat. when we see real life a store has many thousands of sku in that case it is computationally expensive. Apyori package is outdated. i mean there is no recent update from past few years. Results are coming in improper format which need to represent properly and that need computational operation to perform. mlxtend used two way based approach which generate frequent itemset and association rules over that. -check here for more info mlxtend are proper and has community support.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM