在 Sparklyr 中使用 ml_fpgrowth 选择支持度和置信度值

Question

I am trying to take some inspiration from this Kaggle script where the author is using arules to perform a market basket analysis in R. I am particularly interested in the section where they pass in a vector of confidence and support values and then plots the number of rules generated to help chose the optimal values to use rather than generating a massive number of rules.我试图从这个Kaggle 脚本中获得一些灵感，其中作者使用 arules 在 R 中执行市场篮子分析。我对他们传递置信度和支持值向量然后绘制数量的部分特别感兴趣生成规则以帮助选择要使用的最佳值，而不是生成大量规则。

I wish to try the same process but I am using sparklyr/spark with fpgrowth in R and I am struggling achieve the same output ie count of rules for each confidence and support value.我想尝试相同的过程，但我在 R 中使用 sparklyr/spark 和 fpgrowth，我正在努力实现相同的输出，即每个置信度和支持值的规则计数。

From the limited examples and documentation I believe I pass my transaction data to ml_fpgrowth with my confidence and support values.从有限的示例和文档中，我相信我将我的交易数据以我的信心和支持值传递给 ml_fpgrowth。 This function then generates a model which then needs to be passed to ml_association_rules to generate the rules.此函数然后生成一个模型，然后需要将其传递给 ml_association_rules 以生成规则。

# CONVERT TABLE TO TRANSACTION FORMAT
trans <- medical_tbl %>% 
  group_by(alt_claim_id) %>%
  summarise(items = collect_list(proc_cd))

# SUPPORT AND CONFIDENCE VALUES
supportLevels <- c(0.1, 0.05, 0.01, 0.005)
confidenceLevels <- c(0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1)

# EMPTY LISTS
model_sup10 <- vector("list", length = 9)
model_sup5 <- vector("list", length = 9)
model_sup1 <- vector("list", length = 9)
model_sup0.5 <- vector("list", length = 9)

# FP GROWTH ALGORITHM WITH A SUPPORT LEVEL OF 10%
for (i in 1:length(confidenceLevels)) {
  model_sup10[i] <- ml_fpgrowth(trans,
                                min_support = supportLevels[1],
                                min_confidence = confidenceLevels[i],
                                items_col = "items",
                                uid = random_string("fpgrowth_"))}

I tried checking some of the rules for one of the models above model_sup10我尝试检查 model_sup10 以上模型之一的一些规则1 and I cannot extract any rules. 1 ，我无法提取任何规则。 From the code below I get the following errors从下面的代码我得到以下错误

rules <- ml_association_rules(model_sup10[[1]][1])
Error: $ operator is invalid for atomic vectors

Can anyone help or even explain if this is possible with fpgrowth and what is the best way forward to achieve my goal of plotting the number of rules generated for each support/confidence pairing?任何人都可以帮助甚至解释这是否可以通过 fpgrowth 实现，以及实现我为每个支持/置信配对生成的规则数量的目标的最佳方法是什么？

Answer 1

After some head banging with dplyr and sparklyr I managed to cobble the following together.在用 dplyr 和 sparklyr 敲了敲脑袋之后，我设法拼凑了以下内容。 If anyone has any feedback as to how I can improve on this code then please feel free to comment.如果有人对我如何改进此代码有任何反馈，请随时发表评论。

# SUPPORT AND CONFIDENCE VALUES
supportLevels <- c(0.1, 0.05, 0.01, 0.005)
confidenceLevels <- c(0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1)

# CREATE FUNCTION TO LOOP THROUGH SUPPORT AND CONFIDENCE LEVELS AND RETURN NUMBER OF RULES GENERATED
testModelFunction <- function(i, j) {
  ml_fpgrowth(trans,
              min_support = as.numeric(i),
              min_confidence = as.numeric(j),
              items_col = "items",
              uid = random_string("fpgrowth_")) %>% 
    ml_association_rules() %>% 
    count(name = "rules") %>% 
    pull()
}

# CREATE A LIST TO STORE THE OUTPUT FROM testModelFunction
l = list()
n = 1

for (i in supportLevels) {
  for (j in confidenceLevels) {
    message(paste(i, j))
    tryCatch({
      l[[n]] <- list(supportLevels = i, confidenceLevels = j, n_rules = testModelFunction(i, j))
    }, 
    error = function(e) {
      l[[n]] <- list(supportLevels = i, confidenceLevels = j, error = e)
    })
    n <- n + 1
  }
}

rbindlist(l, fill = T)

在 Sparklyr 中使用 ml_fpgrowth 选择支持度和置信度值

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-01-03 10:24:55

在 Sparklyr 中使用 ml_fpgrowth 选择支持度和置信度值

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-01-03 10:24:55

解决方案1
0 已采纳 2020-01-03 10:24:55