I would like to be able to pass extra variables to functions that are called by spark_apply in sparklyr.
For example:
# setup
library(sparklyr)
sc <- spark_connect(master='local', packages=TRUE)
iris2 <- iris[,1:(ncol(iris) - 1)]
df1 <- sdf_copy_to(sc, iris2, repartition=5, overwrite=T)
# This works fine
res <- spark_apply(df1, function(x) kmeans(x, 3)$centers)
# This does not
k <- 3
res <- spark_apply(df1, function(x) kmeans(x, k)$centers)
As an ugly workaround, I can do what I want by saving values into R packages, and then referencing them. ie
> myPackage::k_equals_three == 3
[1] TRUE
# This also works
res <- spark_apply(df1, function(x) kmeans(x, myPackage::k_equals_three)$centers)
Is there a better way to do this?
I don't have spark set up to test, but can you just create a closure?
kmeanswithk <- function(k) {force(k); function(x) kmeans(x, k)$centers})
k <- 3
res <- spark_apply(df1, kmeanswithk(k))
Basically just create a function to return a function then use that.
spark_apply()
now has a context
argument for you to pass additional objects/variables/etc to the environment.
res <- spark_apply(df1, function(x, k) {
kmeans(x, k)$cluster},
context = {k <- 3})
or
k <- 3
res <- spark_apply(df1, function(x, k) {
kmeans(x, k)$cluster},
context = {k})
The R documentation does not include any examples with the context argument, but you might learn more from reading the PR: https://github.com/rstudio/sparklyr/pull/1107 .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.