简体   繁体   English

如何将变量传递给 spark_apply() 中调用的函数?

[英]How to pass variables to functions called in spark_apply()?

I would like to be able to pass extra variables to functions that are called by spark_apply in sparklyr.我希望能够将额外的变量传递给在 sparklyr 中由 spark_apply 调用的函数。

For example:例如:

# setup
library(sparklyr)
sc <- spark_connect(master='local', packages=TRUE)
iris2 <- iris[,1:(ncol(iris) - 1)]
df1 <- sdf_copy_to(sc, iris2, repartition=5, overwrite=T)

# This works fine
res <- spark_apply(df1, function(x) kmeans(x, 3)$centers)

# This does not
k <- 3
res <- spark_apply(df1, function(x) kmeans(x, k)$centers)

As an ugly workaround, I can do what I want by saving values into R packages, and then referencing them.作为一个丑陋的解决方法,我可以通过将值保存到 R 包中,然后引用它们来做我想做的事。 ie

> myPackage::k_equals_three == 3
[1] TRUE

# This also works
res <- spark_apply(df1, function(x) kmeans(x, myPackage::k_equals_three)$centers)

Is there a better way to do this?有没有更好的方法来做到这一点?

I don't have spark set up to test, but can you just create a closure?我没有设置 spark 来测试,但你能创建一个闭包吗?

kmeanswithk <- function(k) {force(k); function(x) kmeans(x, k)$centers})
k <- 3
res <- spark_apply(df1, kmeanswithk(k))

Basically just create a function to return a function then use that.基本上只是创建一个函数来返回一个函数然后使用它。

spark_apply() now has a context argument for you to pass additional objects/variables/etc to the environment. spark_apply()现在有一个context参数供您将其他对象/变量/等传递给环境。

res <- spark_apply(df1, function(x, k) {
  kmeans(x, k)$cluster},
  context = {k <- 3})

or

k <- 3
res <- spark_apply(df1, function(x, k) {
  kmeans(x, k)$cluster},
  context = {k})

The R documentation does not include any examples with the context argument, but you might learn more from reading the PR: https://github.com/rstudio/sparklyr/pull/1107 . R 文档不包含任何带有上下文参数的示例,但您可以通过阅读 PR 了解更多信息: https : //github.com/rstudio/sparklyr/pull/1107

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM