Sparklyr无法引用spark_apply中的表

Question

I want to use spark_apply to iterate through a number of data processes for feature generation. 我想使用spark_apply遍历用于特征生成的许多数据过程。 To do that I need to reference tables already loaded into spark but get the following error: 为此，我需要引用已经加载到spark中的表，但是会出现以下错误：

ERROR sparklyr: RScript (3076) terminated unexpectedly: object 'ref_table' not found 错误sparklyr：RScript（3076）意外终止：找不到对象'ref_table'

A reproducible example: 一个可重现的示例：

ref_table <-   sdf_along(sc, 10)
apply_table <- sdf_along(sc, 10)

spark_apply(x = apply_table, 
            f = function(x) {
              c(x, ref_table)
            })

I know I can reference libraries inside the function, but not sure how to call up the data. 我知道我可以在函数内部引用库，但不确定如何调用数据。 I am running a local spark cluster through rstudio. 我正在通过rstudio运行本地Spark集群。

Answer 1

Unfortunately the failure is to be expected here. 不幸的是，这里的失败是可以预期的。

Apache Spark, and because of that platforms based on it, doesn't support nested transformations like this one. 由于基于Apache Spark的平台，Apache Spark不支持这种嵌套转换。 You cannot use nested transformation, distributed objects or Spark context ( spark_connection in case of sparklyr ) from a worker code. 您不能从工作程序代码中使用嵌套转换，分布式对象或Spark上下文（在spark_connection情况下为sparklyr ）。

For a detailed explanation please check my answer to Is there a reason not to use SparkContext.getOrCreate when writing a spark job? 有关详细说明，请检查我的回答。是否有理由在编写Spark作业时不使用SparkContext.getOrCreate？ . 。

Your question doesn't give enough context to determine the best course of action here, but in general there two possible solutions: 您的问题并没有提供足够的背景信息来确定最佳的解决方案，但总的来说，有两种可能的解决方案：

As long as one of the datasets is small enough to be stored in memory, use it directly in the closure as a plain R object. 只要其中一个数据集足够小，可以存储在内存中，就可以直接在闭包中将其用作普通R对象。
Reformulate your problem as a join or the Cartesian product (Spark's crossJoin ). 重新制定您的问题作为join或笛卡尔乘积（斯巴克的crossJoin ）。

Sparklyr无法引用spark_apply中的表

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-10-11 10:48:17

Sparklyr无法引用spark_apply中的表

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-10-11 10:48:17

解决方案1
0 已采纳 2018-10-11 10:48:17