简体   繁体   中英

R pruning mining rules - apriori

I have a question about rules obtained using apriori in R.

After data input, conversion and so on, I typed the command

rules <- apriori(orders, parameter = list(supp = 0.01, conf = 0.5, maxlen=2))

and I get my rules. For example:

 lhs        rhs        support confidence      lift
1  {16058} => {16059} 0.01218522  0.9375000 67.886029
2  {16059} => {16058} 0.01218522  0.8823529 67.886029
3  {10049} => {10021} 0.01462226  0.7826087 34.406832
4  {10021} => {10049} 0.01462226  0.6428571 34.406832

My answer is: is there a way to prune rules that are not interesting for me? In this case I'd like to see just the first and the third rule, to avoid a "circular" rule that connects two items with two rules.

Thank you!

I will provide an alternative solution to the rdatamining Titanic example as I found this to be very inefficient for larger rule bases. This problem is indeed mentioned in the CRAN Intro to arules p. 14.

Instead of using the whole itemset matrix (which can be very large), you could use the is.redundant function from arules. This basically looks for all more specific rules (same consequent/RHS, but more items in the LHS/antecedent) that have equal or lower lift/confidence/[other metric]...

More formally, you look for a subset X' of X that has improvement (or at least no decrease) in terms of lift or confidence.

正式的问题描述

The code would then look as follows using lift as measure for improvement:

rules <- apriori(df, parameter = list(supp = 0.01, conf = 0.5, target = "rules"))
rules_lift <- sort(rules, by = 'lift')
rules_pruned <- rules_lift[!is.redundant(rules_lift, measure="lift")]
inspect(head(rules_pruned, 20))

I give this solution since you referred to the Titanic example as a solution to your problem. This approach, however, does not solve the circular rules you mention in the problem description as for these the consequent Y is not the same.

Thanks to rdatamining - rdatamining Titanic example I found this solution tu prune redundant rules:

rules.sorted <- sort(rules, by="lift")
subset.matrix <- is.subset(rules.sorted, rules.sorted)
subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA
redundant <- colSums(subset.matrix, na.rm=T) >= 1
which(redundant)
rules.pruned <- rules.sorted[!redundant]
inspect(rules.pruned)
plot(rules.pruned, method="graph", control=list(type="items"))

I know this is an old post but I found it when I began to look into the same question. Just to point people towards aa more complete resource than your link, the CRAN Intro to arules explains how to use the normal subsetting capabilities in R to prune unwanted rules (middle of page 26): "As typical for association rule mining, the number of rules found is huge. To analyze these rules, for example, subset() can be used to produce separate subsets of rules for each item" and then goes on to explain and give examples.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM