频繁项集和关联规则-Apriori算法

Question

I'm trying to understand the fundamentals of the Apriori (Basket) Algorithm for use in data mining, 我正在尝试了解用于数据挖掘的Apriori（篮子）算法的基础知识，

It's best I explain the complication i'm having with an example: 最好用一个例子来说明我的复杂性：

Here is a transactional dataset: 这是一个事务数据集：

t1: Milk, Chicken, Beer
t2: Chicken, Cheese
t3: Cheese, Boots
t4: Cheese, Chicken, Beer
t5: Chicken, Beer, Clothes, Cheese, Milk
t6: Clothes, Beer, Milk
t7: Beer, Milk, Clothes

The minsup for the above is 0.5 or 50%. 上面的值是0.5或50％。

Taking from the above, my number of transactions is clearly 7 , meaning for an itemset to be "frequent" it must have a count of 4/7 . 综上所述，我的交易次数显然为7 ，这意味着某个项目集“频繁出现”的次数必须为4/7 。 As such this was my Frequent itemset 1: 因此，这就是我的频繁项目集1：

F1: F1：

Milk = 4
Chicken = 4
Beer = 5
Cheese = 4

I then created my candidates for the second refinement (C2) and narrowed it down to: 然后，我为第二个优化（C2）创建了候选者，并将其范围缩小到：

F2: F2：

{Milk, Beer} = 4

This is where I get confused, if I am asked to display all frequent itemsets do I write down all of F1 and F2 or just F2 ? 这是让我感到困惑的地方，如果要求我显示所有频繁的项目集，我要写下F1和F2还是F2 ？ F1 to me aren't "sets". F1对我来说不是“集合”。

I am then asked to create association rules for the frequent itemsets I have just defined and calculate their "confidence" figures, I get this: 然后，我被要求为我刚刚定义的频繁项目集创建关联规则，并计算它们的“置信度”数字，我得到：

Milk -> Beer = 100% confidence
Beer -> Milk = 80% confidence

It seems superfluous to put F1 's itemsets in here as they will all have a confidence of 100% regardless and don't actually "associate" anything, which is the reason I am now questioning whether F1 are indeed "frequent"? 将F1的项目集放在这里似乎是多余的，因为它们将具有100％的置信度，无论它们是否实际上不“关联”任何东西，这就是我现在质疑F1是否确实“频繁”的原因。

Answer 1

Itemsets with size of 1 considered frequent if their support is suitable. 如果支持的大小合适，则认为大小为1的项目集很频繁。 But here you have to consider the minimal threshold . 但是这里您必须考虑最小阈值 。 like if your minimal threshold in your example is 2 then F1 will not be considered. 例如，如果您的示例中的最小阈值为2，则不会考虑F1 。 But if the minimal threshold is 1 then you have to. 但是，如果最小阈值为1，则必须这样做。

you can take a look here and here for more ideas and examples. 您可以在这里和这里看看更多的想法和示例。

Hope that I helped. 希望我能帮上忙。

Answer 2

If the minimum support threshold (minsup) is 4 / 7, then you should include single items in the set of frequent itemsets if they appear in no less than 4 transactions out of 7. So in your example, you should include them: 如果最小支持阈值（minsup）为4/7，则如果单个项目出现在7个事务中的至少4个事务中，则应将其包含在频繁项目集中。因此，在您的示例中，应包括它们：

Milk = 4 Chicken = 4 Beer = 5 Cheese = 4 牛奶= 4鸡= 4啤酒= 5奶酪= 4

For the association rules, they have the form X ==> Y where X and Y are disjoint itemsets and it is generally assumed that X and Y are not empty sets (and this is what is assumed by Apriori). 对于关联规则，它们的格式为X ==> Y，其中X和Y是不相交的项目集，通常假定X和Y不是空集（这是Apriori假定的）。 So therefore, you need at least two items to generate an association rule. 因此，您至少需要两项才能生成关联规则。

频繁项集和关联规则-Apriori算法

问题描述

2 个解决方案

解决方案1
2 已采纳 2013-01-06 16:47:22

解决方案2
0 2013-05-04 22:33:10

频繁项集和关联规则-Apriori算法

问题描述

2 个解决方案

解决方案1 2 已采纳 2013-01-06 16:47:22

解决方案2 0 2013-05-04 22:33:10

解决方案1
2 已采纳 2013-01-06 16:47:22

解决方案2
0 2013-05-04 22:33:10