简体   繁体   English

为什么在此实施中Apriori的运行速度比FP-Growth快?

[英]Why does Apriori run faster than FP-Growth in this implementation?

I am using Christian Borlget's FP-Growth and Apriori packages to find frequent item sets and association rules. 我正在使用Christian Borlget的FP-GrowthApriori软件包来查找频繁的项目集和关联规则。 According to his paper , fp-growth performs better than apriori on all cases. 根据他的论文 ,在所有情况下,fp-growth的性能均优于先验。

Running FP-Growth on my machine, on a ~36MB(~500,000 lines) csv file, shows: 在〜36MB(〜500,000行)csv文件上的计算机上运行FP-Growth,显示:

from fim import apriori, fpgrowth
s = time.time()
fp = fpgrowth(tracts, target='r', supp=0.0065, zmin=2, report="C,S") # tracts is a list of lists
e = time.time()
print(e - s)

41.10438871383667

Whereas Apriori results in: 而Apriori会导致:

s = time.time()
ap = apriori(tracts, target='r', supp=0.0065, zmin=2, report="C,S")
e = time.time()
print(e - s)

34.50810647010803

What am I missing on the implementation? 我在实施过程中缺少什么?

There is no guarantee that either is always better than the other. 无法保证任何一个总是比另一个更好。 Apriori can be very fast if no items satisfy the minimum support, for example. 例如,如果没有项目满足最低支持要求,则Apriori可能会非常快。 When your longest itemsets are 2 itemsets, a quite naive version can be fine. 当最长的项目集是2个项目集时,一个比较幼稚的版本就可以了。 Apriori pruning as well as the fptree only begin to shine when you go for (more interesting!) longer itemsets, which may require choosing a low support parameter. Apriori修剪和fptree仅在您选择(更有趣!)更长的项目集时才开始闪耀,这可能需要选择较低的支持参数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM