如何避免在 dplyr::mutate() 调用中使用多个 arguments 重复代码？

Question

Problem问题

I am transitioning to dplyr from base R .我正在从基地R dplyr

I would like to shorten the following code to respect the DRY (Don't Repeat Yourself) principle:我想缩短以下代码以尊重 DRY（不要重复自己）原则：

mtcars %>% mutate(w = rowMeans(select(., mpg:disp), na.rm = TRUE),
                  x = rowMeans(select(., hp:wt), na.rm = TRUE),
                  y = rowMeans(select(., qsec:am), na.rm = TRUE),
                  z = rowMeans(select(., gear:carb), na.rm = TRUE))

or或者

mtcars %>% rowwise() %>% mutate(w = mean(mpg:disp, na.rm = TRUE),
                                x = mean(hp:wt, na.rm = TRUE),
                                y = mean(qsec:am, na.rm = TRUE),
                                z = mean(gear:carb, na.rm = TRUE))
# Note: this one produced an error with my own data

Goal目标

The goal is to compute the means of different scales in a data frame from a single call.目标是通过一次调用计算数据框中不同尺度的均值。 As you can see, the rowMeans , select , and na.rm arguments repeat several times (imagine I have several more variables than for this example).如您所见， rowMeans 、 select和na.rm arguments 重复了几次（假设我有比这个例子更多的变量）。

Attempts尝试

I was trying to come up with an across() solution,我试图想出一个across()解决方案，

mtcars %>% mutate(across(mpg:carb, mean, .names = "mean_{col}"))

But it doesn't produce the correct outcome because I don't see how to specify different column arguments for w:z .但它不会产生正确的结果，因为我看不到如何为w:z指定不同的列 arguments。 Using the c_across from the documentation example and we are back to repeating code:使用文档示例中的c_across ，我们又回到了重复代码：

mtcars %>% rowwise() %>% mutate(w = mean(c_across(mpg:disp), na.rm = TRUE),
                                x = mean(c_across(hp:wt), na.rm = TRUE),
                                y = mean(c_across(qsec:am), na.rm = TRUE),
                                z = mean(c_across(gear:carb), na.rm = TRUE))

I am tempted to resort to lapply or a custom function but I feel like it would be defeating the purpose of adapting to dplyr and the new across() argument.我很想求助于lapply或自定义 function，但我觉得这会破坏适应dplyr和新的across()参数的目的。

Edit: To clarify, I want to avoid calling rowMeans , select , and na.rm more than once.编辑：澄清一下，我想避免多次调用rowMeans 、 select和na.rm 。

Related threads: 1 , 2 , 3 . 相关主题： 1、2、3 。

Answer 1

We don't need rowwise , instead use select with rowMeans which is vectorized.我们不需要rowwise ，而是使用带有矢量化的select的rowMeans 。 In order to make this easier, a function can be created为了使这更容易，可以创建一个 function

f1 <- function(dat, nm1) {
          dat %>%
            select({{nm1}}) %>%
             rowMeans(na.rm = TRUE)
    }

mtcars %>% mutate(w = f1(dat = ., nm1 = mpg:disp),
                  x = f1(dat = ., nm1 = hp:wt),
                  y = f1(dat = ., nm1 = qsec:am),
                  z = f1(dat = ., nm1= gear:carb)  )

Answer 2

Use a custom function (but organize it a bit differently to reduce repeating code)使用自定义 function （但组织它有点不同以减少重复代码）

mm <- function(data, new_col, cols_to_mut) {
    data %>%
        mutate(
            {{ new_col }} := mean(c_across({{ cols_to_mut }}), na.rm=TRUE)
        )
}

mtcars %>% 
    rowwise %>% 
    mm(w, mpg:disp) %>%
    mm(x, hp:wt) %>%
    mm(y, qsec:am) %>%
    mm(z, gear:carb) %>%
    ungroup

Answer 3

Consider using purrr::reduce2 to avoid the repetition考虑使用purrr::reduce2来避免重复

mtcars %>% 
  reduce2(
    c("w","x", "y", "z"),
    c("mpg:disp", "hp:wt","qsec:am","gear:carb"),
    ~ ..1 %>% rowwise %>% mutate(!!..2 := mean(c_across(!!rlang::parse_expr(..3)), na.rm=TRUE)),
    .init = .)



# A tibble: 32 x 15
# Rowwise: 
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb     w     x     y     z
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4  62.3  38.8  5.82   4  
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4  62.3  38.9  6.01   4  
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1  44.9  33.1  6.87   2.5
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1  95.1  38.8  6.81   2  
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2 129.   60.5  5.67   2.5
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1  83.0  37.1  7.07   2  
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4 127.   83.9  5.28   3.5
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2  58.4  23.0  7      3  
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2  55.9  34.0  7.97   3  
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4  64.3  43.5  6.43   4  
# ... with 22 more rows

如何避免在 dplyr::mutate() 调用中使用多个 arguments 重复代码？

问题描述

Problem问题

Goal目标

Attempts尝试

3 个解决方案

解决方案1
1 已采纳 2020-07-22 01:16:21

解决方案2
1 2020-07-22 02:56:12

解决方案3
1 2022-04-06 02:06:15

如何避免在 dplyr::mutate() 调用中使用多个 arguments 重复代码？

问题描述

Problem问题

Goal目标

Attempts尝试

3 个解决方案

解决方案1 1 已采纳 2020-07-22 01:16:21

解决方案2 1 2020-07-22 02:56:12

解决方案3 1 2022-04-06 02:06:15

解决方案1
1 已采纳 2020-07-22 01:16:21

解决方案2
1 2020-07-22 02:56:12

解决方案3
1 2022-04-06 02:06:15