[英]sparklyr spark_apply user defined function error
I'm trying to pass a custom R function inside spark_apply but keep running into issues and cant figure out what some of the errors mean. 我正在尝试在spark_apply内部传递自定义R函数,但一直遇到问题,无法弄清某些错误的含义。
library(sparklyr)
sc <- spark_connect(master = "local")
perf_df <- data.frame(predicted = c(5, 7, 20),
actual = c(4, 6, 40))
perf_tbl <- sdf_copy_to(sc = sc,
x = perf_df,
name = "perf_table")
#custom function
ndcg <- function(predicted_rank, actual_rank) {
# x is a vector of relevance scores
DCG <- function(y) y[1] + sum(y[-1]/log(2:length(y), base = 2))
DCG(predicted_rank)/DCG(actual_rank)
}
#works in R using R data frame
ndcg(perf_df$predicted, perf_df$actual)
#does not work
perf_tbl %>%
spark_apply(function(e) ndcg(e$predicted, e$actual),
names = "ndcg")
Ok, i'm seeing two possible problems. 好的,我看到两个可能的问题。
(1)-spark_apply prefers functions that have one parameter, a dataframe (1)-spark_apply首选具有一个参数的函数,即数据帧
(2)-you may need to make a package depending on how complex the function in. (2)-根据功能的复杂程度,您可能需要制作一个包装。
let's say you modify ndcg to receive a dataframe as the parameter. 假设您修改ndcg以接收数据帧作为参数。
ndcg <- function(dataset) {
predicted_rank <- dataset$predicted
actual_rank <- dataset$actual
# x is a vector of relevance scores
DCG <- function(y) y[1] + sum(y[-1]/log(2:length(y), base = 2))
DCG(predicted_rank)/DCG(actual_rank)
}
And you put that in a package called ndcg_package 然后将其放入名为ndcg_package的软件包中
now your code will be similar to: 现在您的代码将类似于:
spark_apply(perf_tbl, ndcg, packages = TRUE, names = "ndcg")
Doing this from memory, so there may be a few typos, but it'll get you close. 从内存中执行此操作,因此可能会有一些错别字,但它会让您接近。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.