R中的匿名函數使用sparklyr spark_apply

Question

我正在嘗試使用 library(sparklyr) 中的 spark_apply() 函數我正在使用 spark_apply() 函數，因為 sparklyr 包不支持使用子集。 我對需要在以下 dplyr 語法中包含 function(e) 的位置感到有些迷茫。

這是我試圖用匿名函數改編的原始語法（我不是 100% 這是術語）

match_cat3 <- match_cat2 %>%
          group_by(VarE, VarF) %>%
          mutate(Var_G = if(any(Var_C ==1)) ((VarG - VarG[Var_C == 
1])/(Var_G + Var_G[Var_C == 1])/2) else NA)

這是我嘗試將 spark_apply() 函數與上面的變異方程一起使用。 關於如何使用 function(e) 以及 e 在語法中的位置，我希望得到一些幫助。 我沒有任何在這樣的另一個函數中使用一個函數的經驗。

match_cat3 <- spark_apply(
                    function(e)
                    match_cat2 %>%
                    group_by(e$VarE, e$VarF) %>%
                    mutate(e$Var_G = if(any(e$Var_C ==1)) ((e$VarG - 
e$VarG[e$Var_C == 1])/(e$Var_G + e$Var_G[e$Var_C == 1])/2) else NA, e)
)

``` 這給了我一個越界錯誤。

我的語法基於 spark_apply() 文檔中的以下塊。

trees_tbl %>%
spark_apply(
function(e) data.frame(2.54 * e$Girth, e),
names = c("Girth(cm)", colnames(trees)))

謝謝！

Answer 1

您似乎在編寫sparklyr::spark_apply()函數時遇到了問題。 可能對您更有用的模板從您的 Spark DataFrame 開始。

##### data_sf is a Spark DataFrame that will be sent to all workers for R
data_sf <- sparklyr::copy_to(sc, iris, overwrite = TRUE)

data2_sf <- sparklyr::spark_apply(
  x = data_sf,
  f = function(x) {  ##### data_sf will be the argument passed to this x parameter
    x$Petal_Length <- x$Petal_Length + 10 ##### data_sf will now be converted to an R object used here (Spark doesn't like `Petal.Length` so automatically changes column names)
    return(x)
  })

在你的情況下：

你缺少x參數，第一個在sparklyr::spark_apply()
您通過匿名函數的e參數引入了外部內容（ match_cat2 ），但也錯誤地將其放入了函數的定義中
您的多行表達式周圍缺少括號，因此您沒有定義函數
您正在嘗試使用語法錯誤的dplyr （和magrittr ）——您可以引用group_by(VarE)而不是group_by(e$VarE)之類的變量

函數被定義為function(data, context) {} ，您可以在其中提供任意代碼{} 。 第 11.7 章函數

你試圖在你的 if else 中做一些條件性的事情（你也可以在這里使用ifelse()函數）但我不確定你的意圖是什么

##### Rewritten, maybe helpful?
match_cat3 <- spark_apply(
  x = match_cat2, ##### the Spark DataFrame you give to spark_apply()
                    function(e) { ##### the opening bracket
                    e %>% ##### the function's argument, NOT `match_cat2 %>%`
                    group_by(VarE, VarF) %>% ##### remove `e$`
                    mutate(Var_G = something_good) ##### not sure of your intent here
})

R中的匿名函數使用sparklyr spark_apply

問題描述

1 個解決方案

解決方案1
1 已采納 2022-12-16 21:42:26

R中的匿名函數使用sparklyr spark_apply

問題描述

1 個解決方案

解決方案1 1 已采納 2022-12-16 21:42:26

解決方案1
1 已采納 2022-12-16 21:42:26