简体   繁体   English

特定列的R apriori规则

[英]R apriori rules of specific columns

I have a following dataset: 我有以下数据集:

from | to | time_period
house | shop | evening
residential building | transportation | night
....
food | public building | morning

I use Apriori algorithm: 我使用Apriori算法:

rules = apriori(data, parameter=list(support=0.01, confidence=0.5));
inspect(head(sort(rules, by="lift"),10));

And it produces me following output: lhs rhs support confidence lift 并产生以下输出:lhs rhs支持置信度提升

1  {from=residential building,                                                              
    to=food}                   => {time_period=night}         0.01398601  0.5882353 2.285806
2  {from=entertainment}        => {time_period=evening}       0.02517483  0.5294118 2.188031
3  {to=entertainment}          => {time_period=evening}       0.01678322  0.5217391 2.156321
4  {to=food,                                                                                
    time_period=night}         => {from=residential building} 0.01398601  1.0000000 1.735437
5  {from=food,                                                                              
    to=food}                   => {time_period=daytime}       0.01538462  0.8461538 1.689944
6  {from=public building,                                                                   
    to=food}                   => {time_period=daytime}       0.01958042  0.8235294 1.644758
7  {to=residential building,                                                                
    time_period=night}         => {from=residential building} 0.19580420  0.9459459 1.641629
8  {time_period=night}         => {from=residential building} 0.24195804  0.9402174 1.631688
9  {from=education,                                                                         
    to=residential building}   => {time_period=daytime}       0.01538462  0.7857143 1.569234
10 {from=food,                                                                              
    to=residential building}   => {time_period=daytime}       0.02237762  0.7619048 1.521681

It generates all kinds of rules, but this is not exactly I want. 它生成各种规则,但这并不是我想要的。 I want to have only rules of 我只希望有以下规则

{from, time_period} => {to}

For example, {from=food, time_period=daytime} => {to=residential building}. 例如, {from=food, time_period=daytime} => {to=residential building}. I am not interested in any other rules except {from, time_period} => {to} , which means that rules like {from, to} => {time_period} or {time_period} => {from} or any other does not interest me. 除了{from, time_period} => {to}之外{from, time_period} => {to}我对其他规则均不感兴趣,这意味着{from, to} => {time_period}{time_period} => {from}或其他任何规则都不感兴趣我。

How can I do that? 我怎样才能做到这一点?

Just filter the results , and keep only those rules that you are interested in. 只需过滤结果 ,并仅保留您感兴趣的规则。

The expensive part of Apriori is finding the frequent itemsets, and you cannot save much there. Apriori的昂贵部分是找到频繁的项目集,而您在那里不能节省太多。 You do also need the frequency of itemsets without to . 您也确实需要项集的频率不to

Generating the actual rules afterwards is cheap; 之后生成实际规则很便宜; so you might just as well generate all of them, then only keep those with a to on the right. 所以你还不如产生所有的人,然后只保留那些有to右边。

However, given that you only have three columns, and one is your desired outcome, you don't need association rule mining at all . 但是,由于您只有三列,而其中一列是您想要的结果, 因此根本不需要关联规则挖掘

Apriori and these are beneficial when you have long rules, because they can avoid redundant computations. 当您有很长的规则时,Apriori和这些功能将非常有用,因为它们可以避免多余的计算。 Apriori starts getting interesting at length 3 the earliest. Apriori最早在长度3开始变得有趣。

Here, Apriori will not be more efficient than just enumerating all from -> to , time_period -> to and from + time_period -> to combinations and computing the desired quality each. 在这里,Apriori不会比仅枚举from -> totime_period -> to from + time_period -> to组合以及分别计算所需的质量更有效。

I came across the same situation where i needed to find relationship with only two attributes where the data frame had 11 attributes in total. 我遇到了同样的情况,我只需要查找两个属性之间的关系,而数据框总共有11个属性。 Using apriori generated roughly 7000 rules. 使用apriori可以生成大约7000条规则。

Try something like below. 尝试以下类似的方法。 It worked for me. 它为我工作。

rules = apriori( select(data,c("from","time_period","to") ), parameter=list(support=0.01, confidence=0.5)); 规则=先验( select(data,c(“ from”,“ time_period”,“ to”) ),parameter = list(support = 0.01,confidence = 0.5));

Input for the apriori function is only the columns of interest. 先验函数的输入仅是感兴趣的列。 I extracted columns using select in dplyr package. 我在dplyr包中使用select提取了列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM