简体   繁体   English

如何从 apriori R(关联规则)中提取信息

[英]how to extract information from apriori R (association rules)

I am doing some association rules mining in R and want to extract my results so I can build reports my results look like this:我正在 R 中进行一些关联规则挖掘,并希望提取我的结果,以便我可以构建报告,我的结果如下所示:

> inspect(rules[1:3])
  lhs          rhs                         support confidence lift
1 {apples} => {oranges}                    0.00029       0.24  4.4
2 {apples} => {pears}                      0.00022       0.18 45.6
3 {apples} => {pineapples} 0.00014         0.12  1.8

How do i extract the "rhs" here ie a vector of oranges, pears and pineapples我如何在这里提取“rhs”,即橙子、梨和菠萝的向量

Further how do I extract information out of the summary ie此外,我如何从摘要中提取信息,即

> summary(rules)

The data type is "s4" and have no problem extracting when the output is in the list etc.. how do you do the equivelant?数据类型是“s4”,当输出在列表中时提取没有问题等等。你如何做等价的? set of 3 rules 3条规则

rule length distribution (lhs + rhs):sizes
2 
3 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      2       2       2       2       2       2 

I want to extract the "3" from the "set of 3 rules"我想从“3条规则集”中提取“3”

I have gotten as far as using "@" What does the @ symbol mean in R?我已经使用“@” 了 R 中的 @ 符号是什么意思?

But once i use that, how do i turn my results into a vector ie但是一旦我使用它,我如何将我的结果变成一个向量,即

inspect(rules@rhs)
1 {oranges}
2 {pears}
3 {pineapples}

becomes character vector of length 3成为长度为 3 的字符向量

To answer your second question: length(rules)回答你的第二个问题: length(rules)

Now about your first question:现在关于你的第一个问题:

library("arules")
data("Adult")
## Mine association rules.
rules <- apriori(Adult,parameter = list(supp = 0.5, conf = 0.9, target = "rules"))
summary(rules)

l = length(rules)

everything = labels(rules)
#print(everything)

cut = unlist(strsplit(everything,"=> "))[seq(2,2*l,by=2)]
print(cut)

Don't hesitate if you have a question, this might be a bit dense :-)如果您有问题,请不要犹豫,这可能有点密集:-)

inspect isn't returning anything, just printing its output. inspect不返回任何内容,只是打印其输出。 When this happens you can use the function capture.output if you want to save the output as a string.发生这种情况时,如果要将输出保存为字符串,可以使用函数capture.output For example, getting the rhs例如,获取rhs

data(Adult)
rules <- apriori(Adult, parameter = list(support = 0.4))
inspect(rules[1:3])
#   lhs    rhs                              support confidence lift
# 1 {}  => {race=White}                   0.8550428  0.8550428    1
# 2 {}  => {native-country=United-States} 0.8974243  0.8974243    1
# 3 {}  => {capital-gain=None}            0.9173867  0.9173867    1

## Capture it, and extract rhs
out <- capture.output(inspect(rules[1:3]))
gsub("[^{]+\\{([^}]*)\\}[^{]+\\{([^}]*)\\}.*", "\\2", out)[-1]
# [1] "race=White"                   "native-country=United-States"
# [3] "capital-gain=None"           

However, it looks like you can just access this information from the rules with the function rhs但是,看起来您可以使用函数rhsrules访问此信息

str(rhs(rules)@itemInfo)
# 'data.frame': 115 obs. of  3 variables:
#  $ labels   :Class 'AsIs'  chr [1:115] "age=Young" "age=Middle-aged" "age=Senior" "age=Old" ...
#  $ variables: Factor w/ 13 levels "age","capital-gain",..: 1 1 1 1 13 13 13 13 13 13 ...
#  $ levels   : Factor w/ 112 levels "10th","11th",..: 111 63 92 69 30 54 65 82 90 91 ...

In general, use str to see what objects are made of so you can decide how to extract components.通常,使用str查看由哪些对象组成,以便您可以决定如何提取组件。

You can extract RHS as a character vector of item names (without extraneous text like '=>' or curly brackets) as follows:您可以将 RHS 提取为项目名称的字符向量(没有诸如“=>”或大括号之类的无关文本),如下所示:

rules@rhs@itemInfo$labels[rules@rhs@data@i+1]

The index values stored in rules@rhs@data@i range from 0 to one less than the number of unique labels.存储在rules@rhs@data@i的索引值的范围从 0 到比唯一标签的数量少 1。 Thus, indexing the labels requires adding '1' to avoid attempting to access the 0th element of rules@rhs@itemInfo$labels .因此,索引标签需要添加“1”以避免尝试访问rules@rhs@itemInfo$labels的第 0 个元素。

Perhaps this wasn't an option at the time this question was asked, but there is a DATAFRAME() function to convert the rules object to a data.frame , from which it is a bit easier to extract what you want.也许在提出这个问题时这不是一个选项,但是有一个DATAFRAME()函数可以将rules对象转换为data.frame ,从中提取您想要的内容会更容易一些。 You can even have it exclude the curly braces and set whatever separator you like between items in the item sets.您甚至可以让它排除花括号并在项目集中的项目之间设置您喜欢的任何分隔符。

Borrowing the example from the accepted answer,从接受的答案中借用这个例子,

data(Adult)
rules <- apriori(Adult, parameter = list(support = 0.4))

We can now turn that into a data.frame and do useful things:我们现在可以把它变成一个data.frame并做一些有用的事情:

rule_data <- DATAFRAME(rules, 
                       separate = TRUE, 
                       setStart = '', 
                       itemSep = ',', 
                       setEnd = '')

> str(rule_data)
'data.frame':   169 obs. of  6 variables:
 $ LHS       : Factor w/ 76 levels "","relationship=Husband",..: 1 1 1 1 2 3 2 3 3 3 ...
 $ RHS       : Factor w/ 7 levels "race=White","native-country=United-States",..: 1 2 3 4 5 6 7 7 1 2 ...
 $ support   : num  0.855 0.897 0.917 0.953 0.403 ...
 $ confidence: num  0.855 0.897 0.917 0.953 0.999 ...
 $ lift      : num  1 1 1 1 2.18 ...
 $ count     : int  41762 43832 44807 46560 19704 19704 19715 19899 20054 20003 ...

> rule_data$RHS[1:5]
[1] race=White                       
[2] native-country=United-States     
[3] capital-gain=None                
[4] capital-loss=None                
[5] marital-status=Married-civ-spouse

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM