简体   繁体   中英

sparklyr spark_apply user defined function error

I'm trying to pass a custom R function inside spark_apply but keep running into issues and cant figure out what some of the errors mean.

library(sparklyr)
sc <- spark_connect(master = "local")
perf_df <- data.frame(predicted = c(5, 7, 20), 
                       actual = c(4, 6, 40))


perf_tbl <- sdf_copy_to(sc = sc,
                        x = perf_df,
                        name = "perf_table")

#custom function
ndcg <- function(predicted_rank, actual_rank) { 
  # x is a vector of relevance scores 
  DCG <- function(y) y[1] + sum(y[-1]/log(2:length(y), base = 2)) 
  DCG(predicted_rank)/DCG(actual_rank) 
} 

#works in R using R data frame
ndcg(perf_df$predicted, perf_df$actual)


    #does not work
  perf_tbl %>%
  spark_apply(function(e) ndcg(e$predicted, e$actual),
              names = "ndcg")

Ok, i'm seeing two possible problems.

(1)-spark_apply prefers functions that have one parameter, a dataframe

(2)-you may need to make a package depending on how complex the function in.

let's say you modify ndcg to receive a dataframe as the parameter.

ndcg <- function(dataset) { 
     predicted_rank <- dataset$predicted
      actual_rank <- dataset$actual
      # x is a vector of relevance scores 
      DCG <- function(y) y[1] + sum(y[-1]/log(2:length(y), base = 2)) 
      DCG(predicted_rank)/DCG(actual_rank) 
} 

And you put that in a package called ndcg_package

now your code will be similar to:

spark_apply(perf_tbl, ndcg, packages = TRUE, names = "ndcg")

Doing this from memory, so there may be a few typos, but it'll get you close.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM