[英]R- Is there a way to limit apriori rules by lift?
I'm looking at this data set: https://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.data我正在查看这个数据集: https : //archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.data
I preprocessed the data:我对数据进行了预处理:
ca.1<-read.csv("CreditApproval.csv",T,",")
# From http://stackoverflow.com/q/4787332/
remove_outliers <- function(x, na.rm = TRUE, ...) {
qnt <- quantile(x, probs=c(.25, .75), na.rm = na.rm, ...)
H <- 1.5 * IQR(x, na.rm = na.rm)
y <- x
y[x < (qnt[1] - H)] <- NA
y[x > (qnt[2] + H)] <- NA
y
}
ca.1$A2<-remove_outliers(ca$A2)
ca.1$A3<-remove_outliers(ca$A3)
ca.1$A8<-remove_outliers(ca$A8)
ca.1$A11<-remove_outliers(ca$A11)
ca.1$A14<-remove_outliers(ca$A14)
ca.1$A15<-remove_outliers(ca$A15)
ca.1$A2<-discretize(ca.1$A2,"frequency",categories = 6)
ca.1$A3<-discretize(ca.1$A3,"frequency",categories = 6)
ca.1$A8<-discretize(ca.1$A8,"frequency",categories = 6)
ca.1$A11<-discretize(ca.1$A11,"frequency",categories = 6)
ca.1$A14<-discretize(ca.1$A14,"frequency",categories = 6)
ca.1$A15<-discretize(ca.1$A15,"frequency",categories = 6)
ca.1<-na.omit(ca.1)
After fine tuning the support, confidence, min/maxlen I'm still getting 65 rules:在微调支持度、置信度、最小/最大长度后,我仍然得到 65 条规则:
> rules<-apriori(ca.1, parameter= list(supp=0.15, conf=0.89, minlen=3, maxlen=4), appearance=list(rhs=c("class=-", "class=+"), default="lhs"))
> rules.sorted <- sort(rules, by="lift")
> inspect(rules.sorted)
lhs rhs support confidence lift
[1] {A5=g,A9=t,A10=t} => {class=+} 0.1521739 0.8974359 2.770607
[2] {A4=u,A9=t,A10=t} => {class=+} 0.1521739 0.8974359 2.770607
[3] {A1=a,A9=f} => {class=-} 0.1717391 0.9753086 1.442579
[4] {A1=a,A9=f,A13=g} => {class=-} 0.1608696 0.9736842 1.440176
...[65]
As you can see +
rules have a greater lift, but less support and confidence than the -
rules.正如您所看到的,
+
规则比-
规则有更大的提升,但支持和信心更少。 I've been looking through the docs, and can't find any parameter to limit by lift.我一直在查看文档,但找不到任何要通过提升来限制的参数。 Is this possible?
这可能吗? If not, what do you do in situations like this?
如果没有,在这种情况下你会怎么做?
In arules package a special function to subset this object type is defined.在arules包中定义了一个特殊的函数来子集这个对象类型。 In order to filter out rules with lift value less than 2 you can try the following:
为了过滤掉提升值小于 2 的规则,您可以尝试以下操作:
subset(rules, subset = lift > 2)
You can't limit apriori rules by lift alone.您不能仅通过提升来限制先验规则。 You have to get a limit by support and confidence first which you did here:
您必须首先获得支持和信心的限制,您在这里所做的:
rules<-apriori(ca.1, parameter= list(supp=0.15, conf=0.89, minlen=3, maxlen=4)
Then after that, do something like this然后在那之后,做这样的事情
rulesLift <- sort(subset(rules, subset = lift < 2), by="lift")
inspect(rulesLift)
Another way is to use arules::quality()
.另一种方法是使用
arules::quality()
。 For example:例如:
association.rules <- apriori(tr, parameter = list(support=0.005, confidence=0.25, minlen=3, maxlen=10))
subRules<-association.rules[quality(association.rules)$lift > 1]
This function can filter by support, confidence, coverage, lift, count
.此功能可以按
support, confidence, coverage, lift, count
进行过滤。
I think apriori function does not take lift as one of the parameter.我认为先验函数不会将提升作为参数之一。 I get this error if I try to set lift
如果我尝试设置电梯,我会收到此错误
Error: Invalid parameter: lift错误:无效参数:lift
Instead I could sort the rules by lift and pick the rules based on the lift value as follows相反,我可以按提升对规则进行排序,然后根据提升值选择规则,如下所示
sort (rules, by="lift", decreasing=TRUE)排序(规则,by="lift",递减=TRUE)
This is not a straightforward solution but a decent workaround这不是一个简单的解决方案,而是一个不错的解决方法
What if you tried:如果您尝试过会怎样:
apriori(df, parameter = list(lift = 0.3, minlen =2))
You can set your minimum lift to anything in this case, just chose 0.3.在这种情况下,您可以将最小提升设置为任何值,只需选择 0.3。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.