简体   繁体   English

使用R脚本部署Azure ML实验以挖掘关联规则时出错

[英]Error deploying azure ML experiment with R script for mining association rules

I have created a new experiment on Azure Machine Learning studio that through the module Execute R Script is able to do the mining of the association rules from the starting dataset. 我在Azure机器学习工作室上创建了一个新实验,该实验通过模块Execute R Script能够从起始数据集中挖掘关联规则。 For this experiment I used the R version Microsoft R Open 3.2.2 在本实验中,我使用了R版本Microsoft R Open 3.2.2

The function used in the experiment on Azure ML, I first wrote and tested it on R studio, where I did not have any kind of problem. 我首先在R Studio上编写并测试了Azure ML实验中使用的功能,但我没有遇到任何问题。 This is the structure of my experiment: 这是我的实验结构: 实验

and this is a part of code inserted inside the module on Azure ML that on R Studio works properly: 这是插入到Azure ML的模块中的代码的一部分,该代码在R Studio上正常运行:

# Map 1-based optional input ports to variables
dataset1 <- maml.mapInputPort(1) # class: data.frame

library("arules")
library("sqldf")

x <- sqldf('select ID_Ordine, AnnoOrdine, ZonaCommerciale, Modello, SUM(Qta) as Qta 
            from dataset1 group by ID_Ordine, Modello order by ID_Ordine')

a_list1 <- transform(x, Modello = as.factor(Modello),
                     ID_Ordine = as.factor(ID_Ordine)) 
transactions <- as(split(x[,"Modello"], x[,"ID_Ordine"]), "transactions")
rules <- sort(apriori(transactions,
                        parameter = list(supp = 0.1, conf = 0.1, target = "rules",
                                         maxlen = 5)), by="lift")
gi <- generatingItemsets(rules) #remove inverse duplicated rules
d <- which(duplicated(gi))      #remove inverse duplicated rules
rules <- rules[-d]              #remove inverse duplicated rules

#create a dataframe to be used as output
result <- data.frame(label_lhs = labels(lhs(rules)), 
                     label_rhs = labels(rhs(rules)),
                     count = quality(rules)["count"])

# Select data.frame to be sent to the output Dataset port
maml.mapOutputPort("result");

If I exclude this line from the code count = quality(rules)["count"] (the statement to import into the output dataframe the column relating to the count) the experiment works correctly, but when I also import the count column, the execution of the experiment gives me the following error: 如果我从代码count = quality(rules)["count"] (将与计数有关的列导入到输出数据帧的语句count = quality(rules)["count"]排除了这一行,则实验可以正常进行,但是当我也导入count列时,执行实验给我以下错误: 在此处输入图片说明

Someone knows how to fix this error, or knows an alternative way to select the count column from the arules object recognized by Azure ML? 有人知道如何解决此错误,或者知道从Azure ML识别的arules对象中选择count列的另一种方法吗?

Thanks for any suggestions 感谢您的任何建议

The count column is not calculated by the function apriori() in this version of the package arules , so I calculated it in this way, using the inverse formula to calculate the support: 在此版本的包arulescount列不是由函数apriori()计算的,因此我以这种方式使用反公式计算支持量:

#create a dataframe to be used as output
result <- data.frame(label_lhs = labels(lhs(rules)), 
                     label_rhs = labels(rhs(rules)),
                     count = quality(rules)$support*length(transactions))

because the support is calculated with the following formula: 因为支持是使用以下公式计算的:

support = (number of transactions with A&B)/(number of total transactions)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM