[英]Building the “transactions” Class for Association Rule Mining in SparkR using arules and apriori
I am using SparkR and trying to convert a "SparkDataFrame" to "transactions" in order to mine association of items/ products. 我正在使用SparkR并尝试将“ SparkDataFrame”转换为“交易”,以挖掘商品/产品的关联。
I have found a similar example on this link https://blog.aptitive.com/building-the-transactions-class-for-association-rule-mining-in-r-using-arules-and-apriori-c6be64268bc4 but this is only if you are working with an R data.frame. 我在此链接https://blog.aptitive.com/building-the-transactions-class-for-association-rule-mining-in-r-using-arules-and-apriori-c6be64268bc4中找到了类似的示例,但这仅当您使用R data.frame时。 I currently have my data in this format;
我目前以这种格式保存数据;
CUSTOMER_KEY_h PRODUCT_CODE
1 SAVE
1 CHEQ
1 LOAN
1 LOAN
1 CARD
1 SAVE
2 CHEQ
2 LOAN
2 CTSAV
2 SAVE
2 CHEQ
2 SAVE
2 CARD
2 CARD
3 LOAN
3 CTSAV
4 SAVE
5 CHEQ
5 SAVE
5 CARD
5 LOAN
5 CARD
6 CHEQ
6 CHEQ
and would like to end up with something like this; 并希望最终得到这样的结果;
CUSTOMER_KEY_h PRODUCT_CODE
1 {SAVE, CHEQ, LOAN, LOAN , CARD, SAVE}
2 {CHEQ, LOAN, CTSAV, SAVE, CHEQ, SAVE, CARD, CARD}
3 {LOAN, CTSAV}
4 {SAVE}
5 {CHEQ, SAVE, CARD, LOAN, CARD}
6 {CHEQ, CHEQ}
Alternatively, If I can get the equivalent of this R script in SparkR df2 <- apply(df,2,as.logical)
that would be helpful. 或者,如果我可以在SparkR
df2 <- apply(df,2,as.logical)
-apply df2 <- apply(df,2,as.logical)
中获得与该R脚本等效的内容df2 <- apply(df,2,as.logical)
那将是有帮助的。
arules
package is not compatible with SparkR. arules
软件包与SparkR不兼容。 If you want to explore association rules on Spark, you should use it's own utilities. 如果要在Spark上浏览关联规则,则应使用它自己的实用程序。 First use
collect_set
to combine records: 首先使用
collect_set
合并记录:
library(magrittr)
df <- createDataFrame(data.frame(
CUSTOMER_KEY_h = c(
1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5, 5, 5, 6, 6),
PRODUCT_CODE = c(
"SAVE","CHEQ","LOAN","LOAN","CARD","SAVE","CHEQ","LOAN","CTSAV","SAVE",
"CHEQ","SAVE","CARD","CARD","LOAN","CTSAV","SAVE","CHEQ","SAVE","CARD","LOAN",
"CARD","CHEQ","CHEQ")
))
baskets <- df %>%
groupBy("CUSTOMER_KEY_h") %>%
agg(alias(collect_set(column("PRODUCT_CODE")), "items"))
Fit the model (please check spark.fpGrowth
docs for the full list of the available options): 拟合模型(请检查
spark.fpGrowth
文档以获取可用选项的完整列表):
fpgrowth <- spark.fpGrowth(baskets)
and use it to extract association rules: 并使用它来提取关联规则:
arules <- fpgrowth <- spark.fpGrowth(baskets)
arules %>% head()
antecedent consequent confidence lift
1 CARD, LOAN SAVE 1 1.5
2 CARD, LOAN CHEQ 1 1.5
3 LOAN, SAVE, CHEQ CARD 1 2.0
4 SAVE, CHEQ LOAN 1 1.5
5 SAVE, CHEQ CARD 1 2.0
6 CARD, SAVE LOAN 1 1.5
If you use Spark < 2.3.0 you can try replacing: 如果使用Spark <2.3.0,则可以尝试替换:
alias(collect_set(column("PRODUCT_CODE")), "items")
with 与
expr("collect_set(PRODUCT_CODE) AS items")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.