c_across 使用 dplyr 每行的多個函數

Question

是否可以使用rowwise()和c_across()在每行的同一 c_across 語句中應用多個函數。 使用 cross across() function 我們可以使用每個列的函數列表，它是否也適用於 c_across？

#sample data
df <- tibble(a = c(1, 2, 3, 25, 1),
             b = c(5, 26, 8, 8, 3),
             c = c(9, 10, 11, 11, 12),
             d = c('a', 'b', 'c', 'd', 'e'),
             e = c(1, 2, 3, 4, 7))


#This will work

df %>% 
  rowwise() %>% 
  mutate(max = max (c_across(where(is.numeric)), na.rm = TRUE) ,
         min = min (c_across(where(is.numeric)), na.rm = TRUE),
         avg = mean(c_across(where(is.numeric)), na.rm = TRUE))

# A tibble: 5 x 8
# Rowwise: 
      a     b     c d         e   max   min   avg
  <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1     1     5     9 a         1     9     1  4.33
2     2    26    10 b         2    26     2 11.3 
3     3     8    11 c         3    11     3  6.5 
4    25     8    11 d         4    25     4 12.8 
5     1     3    12 e         7    12     1  6  

#This returns errors
df %>% 
  rowwise() %>% 
  mutate(c_across(where(is.numeric)), 
  list( mean = ~mean(., na.rm = TRUE), min = ~min(., na.rm = TRUE),
      max = ~max(., na.rm = TRUE)))

Answer 1

我認為您的“這將起作用”實際上不起作用。 當您執行變異時，它會考慮您之前創建的列。 即，在計算平均值時，它也選擇了min和max列。 第 1 行a, b, c, e列的平均值應為 16/4 = 4，而不是 26/6 = 4.33。 但我想我明白你的意思...

誠然，這感覺有點 hacky，但它有效（至少在示例數據上）。 tibble across 此解決方案的目的是讓它從一系列c_across變異中返回一個tibble 。 因為原始數據（ df ）是分開的，所以它至少為您提供了我個人期望的平均值

df %>% 
  rowwise() %>% 
  mutate(
    across(1,
      list( mean = ~mean(c_across(where(is.numeric)), na.rm = TRUE),
            min = ~min(c_across(where(is.numeric)), na.rm = TRUE),
            max = ~max(c_across(where(is.numeric)), na.rm = TRUE)),
      .names = "{.fn}"
    )
  )
#> # A tibble: 5 x 8
#> # Rowwise: 
#>       a     b     c d         e  mean   min   max
#>   <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1     1     5     9 a         1  4        1     9
#> 2     2    26    10 b         2 10        2    26
#> 3     3     8    11 c         3  6.25     3    11
#> 4    25     8    11 d         4 12        4    25
#> 5     1     3    12 e         7  5.75     1    12

^{由代表 package (v1.0.0) 於 2021 年 4 月 30 日創建}

因為提供的函數都忽略了實際選擇的列（即使用c_across(where(is.numeric))來確定相關列而不是 cross 提供的列，所以您可以使用任何單個有效列作為 cross across第一個參數。關鍵是只提供一列，以便函數只計算一次。我使用了第 1 列，因為通常可以安全地假設您的 data.frame 至少有一列。

c_across 使用 dplyr 每行的多個函數

問題描述

1 個解決方案

解決方案1
2 2021-04-30 13:58:55

c_across 使用 dplyr 每行的多個函數

問題描述

1 個解決方案

解決方案1 2 2021-04-30 13:58:55

解決方案1
2 2021-04-30 13:58:55