有沒有辦法從作用於它的 `mutate()` 函數訪問整個 tibble/grouped tibble？

Question

例如，

df1 = expand.grid(x1=1:2,x2=1:2,x3=1:2,x4=1:2,x5=1:2,x6=1:2) %>%
 mutate(
  x7 = sample(1:2,64,T), 
  y1 = rnorm(64)
 )

df2 = expand.grid(x1=1:2,x2=1:2,x3=1:2,x4=1:2,x5=1:2,x6=1:2) %>%
 mutate(
  x7 = sample(1:2,64,T), 
  y2 = rnorm(64)
 )

myfunc <- function(data){
    data %>%
     mutate(key = paste(x1,x2,x3,x4,x5,x6)) %>%
     pull(key)
}

joined_df = df1 %>%
 mutate(y3 = runif(64)) %>%
 mutate(key=myfunc([some sort of expression referencing df1])) %>%
 inner_join(
  df2 %>%
   mutate(y4 = runif(64)) %>%
   mutate(key=myfunc([some sort of expression referencing df2]),
  by='key'
)

本質上，我想避免從一個看起來像的函數重新創建數據框

myfunc_v2 <- function(data){
    data %>%
     mutate(key = paste(x1,x2,x3,x4,x5,x6)) 
}

盡管myfunc_v2()可以說更myfunc_v2() ，但主要原因是我通常使用rename_all()等轉換函數更改變量的名稱，這些函數跨格式不同的源，但不想在主副本中實際修改它們，因為我保留了其中一個小標題的列名格式，然后丟棄了其他小標題。

Answer 1

解決方案很簡單。

當使用管道運算符%>% ，這是 dplyr 通常使用的方式，您可以指定它在函數中作用於參數的位置。

對於參數的副本，您需要做的就是將(.)放在您想要對象的位置，前提是它不在某個匿名函數內（例如，使用mutate_all(data, list(scaled=~scale(.), signed=sign(.)) 。

解決方案看起來就像

joined_df = df1 %>%
 mutate(y3 = runif(64)) %>%
 mutate(key=myfunc((.)) %>%
 inner_join(
  df2 %>%
   mutate(y4 = runif(64)) %>%
   mutate(key=myfunc((.)),
  by='key'
)

有沒有辦法從作用於它的 `mutate()` 函數訪問整個 tibble/grouped tibble？

問題描述

1 個解決方案

解決方案1
0 2021-11-17 06:23:39

有沒有辦法從作用於它的 `mutate()` 函數訪問整個 tibble/grouped tibble？

問題描述

1 個解決方案

解決方案1 0 2021-11-17 06:23:39

解決方案1
0 2021-11-17 06:23:39