[英]R - arules package association rules as a dataframe
I have generated association rules in R using arules package. 我已经使用arules软件包在R中生成了关联规则。 Rules have been generated for 6 columns/fields.
已为6列/字段生成规则。 What I would like to have is a dataframe consisting of 6 columns, these columns should be populated by association rules.
我想拥有一个由6列组成的数据框,这些列应由关联规则填充。
eg:- 例如:-
This should be put into a dataframe in this way. 应该以这种方式将其放入数据框。
This requires some coding and understanding the used data structures in R and arules. 这需要一些编码和理解R和arules中使用的数据结构。 Here is some code that (hopefully) does what you want.
这是一些(希望)完成您想要的代码。
library(arules)
# create some data
dat <- data.frame(
Sex = c("M", "F", "M"),
Status = c("Y", "Y", "N"),
Job = c("Y", "Y", "N"),
Loan = c("Y", "Y", "N")
)
trans <- as(dat, "transactions")
itemInfo(trans)
# labels variables levels
# 1 Sex=F Sex F
# 2 Sex=M Sex M
# 3 Status=N Status N
# 4 Status=Y Status Y
# 5 Job=N Job N
# 6 Job=Y Job Y
# 7 Loan=N Loan N
# 8 Loan=Y Loan Y
# arulesCBA can mine classification rules (CARs) with items for the
# class variable in the RHS.
library(arulesCBA)
rules <- mineCARs(Loan ~ ., trans, parameter = list(supp = 1/3, conf = 0))
inspect(head(rules))
# lhs rhs support confidence lift count
# [1] {} => {Loan=N} 0.3333333 0.3333333 1.0 1
# [2] {} => {Loan=Y} 0.6666667 0.6666667 1.0 2
# [3] {Sex=F} => {Loan=Y} 0.3333333 1.0000000 1.5 1
# [4] {Status=N} => {Loan=N} 0.3333333 1.0000000 3.0 1
# [5] {Job=N} => {Loan=N} 0.3333333 1.0000000 3.0 1
# [6] {Sex=M} => {Loan=N} 0.3333333 0.5000000 1.5 1
# rules store information about how the items relate to the original variables
ii <- itemInfo(rules)
ii
# labels variables levels
# 1 Sex=F Sex F
# 2 Sex=M Sex M
# 3 Status=N Status N
# 4 Status=Y Status Y
# 5 Job=N Job N
# 6 Job=Y Job Y
# 7 Loan=N Loan N
# 8 Loan=Y Loan Y
# start with translating the rules into a logical matrix
m <- as(items(rules), "matrix")
head(m)
# Sex=F Sex=M Status=N Status=Y Job=N Job=Y Loan=N Loan=Y
# [1,] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
# [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
# [3,] TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
# [4,] FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE
# [5,] FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
# [6,] FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE
# do some R tricks to create the data.frame
df <- do.call(cbind,
lapply(unique(ii$variables), FUN = function(var) {
cols <- which(ii$variables == var)
df <- data.frame(factor(apply(t(m[,cols])*(1:length(cols)), MARGIN = 2, max),
levels = 1:length(cols),
labels = ii$levels[cols]))
colnames(df) <- var
df
}))
# add quality measures
df <- cbind(df, quality(rules))
head(df)
# Sex Status Job Loan support confidence lift count
# 1 <NA> <NA> <NA> N 0.3333333 0.3333333 1.0 1
# 2 <NA> <NA> <NA> Y 0.6666667 0.6666667 1.0 2
# 3 F <NA> <NA> Y 0.3333333 1.0000000 1.5 1
# 4 <NA> N <NA> N 0.3333333 1.0000000 3.0 1
# 5 <NA> <NA> N N 0.3333333 1.0000000 3.0 1
# 6 M <NA> <NA> N 0.3333333 0.5000000 1.5 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.