简体   繁体   English

R修剪挖掘规则-先验

[英]R pruning mining rules - apriori

I have a question about rules obtained using apriori in R.我有一个关于在 R 中使用 apriori 获得的规则的问题。

After data input, conversion and so on, I typed the command数据输入、转换等后,我输入命令

rules <- apriori(orders, parameter = list(supp = 0.01, conf = 0.5, maxlen=2))规则 <- 先验(订单,参数 = 列表(补充 = 0.01,conf = 0.5,maxlen = 2))

and I get my rules.我明白我的规则。 For example:例如:

 lhs        rhs        support confidence      lift
1  {16058} => {16059} 0.01218522  0.9375000 67.886029
2  {16059} => {16058} 0.01218522  0.8823529 67.886029
3  {10049} => {10021} 0.01462226  0.7826087 34.406832
4  {10021} => {10049} 0.01462226  0.6428571 34.406832

My answer is: is there a way to prune rules that are not interesting for me?我的回答是:有没有办法修剪我不感兴趣的规则? In this case I'd like to see just the first and the third rule, to avoid a "circular" rule that connects two items with two rules.在这种情况下,我只想看到第一条和第三条规则,以避免将两个项目与两个规则连接起来的“循环”规则。

Thank you!谢谢!

I will provide an alternative solution to the rdatamining Titanic example as I found this to be very inefficient for larger rule bases.我将提供rdatamining Titanic 示例的替代解决方案,因为我发现这对于较大的规则库效率非常低。 This problem is indeed mentioned in the CRAN Intro to arules p.这个问题确实在CRAN Intro to arules p 中提到过。 14. 14.

Instead of using the whole itemset matrix (which can be very large), you could use the is.redundant function from arules.您可以使用is.redundant函数,而不是使用整个项集矩阵(可能非常大)。 This basically looks for all more specific rules (same consequent/RHS, but more items in the LHS/antecedent) that have equal or lower lift/confidence/[other metric]...这基本上会寻找所有更具体的规则(相同的结果/RHS,但 LHS/前因中的更多项目)具有相同或更低的提升/置信度/[其他指标]...

More formally, you look for a subset X' of X that has improvement (or at least no decrease) in terms of lift or confidence.更正式地说,您寻找 X 的子集 X' 在提升或置信度方面有所改善(或至少没有减少)。

正式的问题描述

The code would then look as follows using lift as measure for improvement:然后,使用提升作为改进措施的代码如下所示:

rules <- apriori(df, parameter = list(supp = 0.01, conf = 0.5, target = "rules"))
rules_lift <- sort(rules, by = 'lift')
rules_pruned <- rules_lift[!is.redundant(rules_lift, measure="lift")]
inspect(head(rules_pruned, 20))

I give this solution since you referred to the Titanic example as a solution to your problem.我给出了这个解决方案,因为你提到了泰坦尼克号的例子来解决你的问题。 This approach, however, does not solve the circular rules you mention in the problem description as for these the consequent Y is not the same.但是,这种方法并不能解决您在问题描述中提到的循环规则,因为对于这些规则,结果 Y 不相同。

Thanks to rdatamining - rdatamining Titanic example I found this solution tu prune redundant rules:多亏了 rdatamining - rdatamining Titanic 示例,我找到了这个解决方案 tu prune 冗余规则:

rules.sorted <- sort(rules, by="lift")
subset.matrix <- is.subset(rules.sorted, rules.sorted)
subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA
redundant <- colSums(subset.matrix, na.rm=T) >= 1
which(redundant)
rules.pruned <- rules.sorted[!redundant]
inspect(rules.pruned)
plot(rules.pruned, method="graph", control=list(type="items"))

I know this is an old post but I found it when I began to look into the same question.我知道这是一个旧帖子,但是当我开始研究同一个问题时我发现了它。 Just to point people towards aa more complete resource than your link, the CRAN Intro to arules explains how to use the normal subsetting capabilities in R to prune unwanted rules (middle of page 26): "As typical for association rule mining, the number of rules found is huge. To analyze these rules, for example, subset() can be used to produce separate subsets of rules for each item" and then goes on to explain and give examples.只是为了将人们指向比链接更完整的资源, CRAN Intro to arules解释了如何使用 R 中的正常子集功能来修剪不需要的规则(第 26 页中间):“作为关联规则挖掘的典型,发现的规则是巨大的。为了分析这些规则,例如,可以使用 subset() 为每个项目生成单独的规则子集”,然后继续解释并举例说明。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM