简体   繁体   English

R中的关联规则挖掘-Arules Package和R Studio

[英]Association Rule mining in R - arules package and r studio

I am mining patterns in a dataset that has 1000 transactions of 14 commodities. 我正在从具有14种商品的1000笔交易的数据集中挖掘模式。 Each transaction has 0 or 1 in the columns for product based on whether or not that product was purchased. 根据是否购买该产品,每笔交易的产品列中都有0或1。 Most of the variables have value 0. 大多数变量的值为0。

When I am using apriori algorithm on this dataset, the top rules are for the products which are not purchased, like : {var1=0, var2=0,var3=0} => {var4=0} However I am more interested to know which products are being purchased together. 当我在此数据集上使用apriori算法时,最高规则适用于未购买的产品,例如:{var1 = 0,var2 = 0,var3 = 0} => {var4 = 0}但是,我对了解一起购买哪些产品。

dataset : Trans var1 var2 var3 var4 1 1 0 1 1 2 0 0 0 1 3 0 0 1 0 4 0 0 0 1 5 1 0 1 0 6 1 0 0 0 数据集:Trans var1 var2 var3 var4 1 1 0 1 1 2 0 0 0 1 3 0 0 1 0 4 0 0 0 1 5 1 0 1 0 6 1 0 0 0

rules <- apriori(dataset,
 parameter = list(minlen=3, supp=0.002, conf=0.2),
 appearance = list(rhs=c("var1=1","var2=1","var3=1"),
 lhs=c("var1=1","var2=1","var3=1"),
 default="none"),
 control = list(verbose=F))

First thing first, R studio is getting crashed when I try running this. 首先,当我尝试运行R Studio时,R Studio崩溃了。 Second point, I am interested to mine run this piece of code as : 第二点,我有兴趣将这段代码运行为:

rules <- apriori(dataset,
 parameter = list(minlen=3, supp=0.002, conf=0.2),
 appearance = list(rhs!=c("var1=0","var2=0","var3=0"),
 lhs!=c("var1=0","var2=0","var3=0"),
 default="none"),
 control = list(verbose=F))

This is getting errored out!! 这越来越错误了!

Difference : != and 0 instead of 1 So that I get patterns only on items purchased, not on the items which are not being purchased. 区别:!=和0而不是1,所以我只能在购买的商品上得到图案,而不是在没有购买的商品上。

Thanks in advance!! 提前致谢!!

I was able to find a workaround to solve this problem as : 我能够找到解决此问题的解决方法,例如:

I changed the dataframe into a matrix and I am no longer getting patterns on items which were not purchsed. 我将数据框更改为矩阵,并且不再获取未购买商品的图案。 Maybe this is the way the algo works, or maybe(hopefully) there is some mistake in my approach. 也许这就是算法的工作方式,或者(希望)我的方法有一些错误。

m <- as.matrix(dataset[,-1]) # removing the transaction id column
names(m) <- paste("Transaction " ,rownames(dataset))
rules.all <- apriori(as(m,"transactions"),parameter = 
                           list(support = 0.1, confidence = 0.8))
inspect(rules.all)
rules.sorted <- sort(rules.all, by="lift")
inspect(rules.sorted)
subset.matrix <- is.subset(rules.sorted, rules.sorted)
subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA
redundant <- colSums(subset.matrix, na.rm=T) >= 1
which(redundant)
plot(rules.all)

Thanks!! 谢谢!!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM