[英]How to avoid repeating code in dplyr::mutate() call with multiple arguments?
I am transitioning to dplyr
from base R
.我正在从基地R
dplyr
I would like to shorten the following code to respect the DRY (Don't Repeat Yourself) principle:我想缩短以下代码以尊重 DRY(不要重复自己)原则:
mtcars %>% mutate(w = rowMeans(select(., mpg:disp), na.rm = TRUE),
x = rowMeans(select(., hp:wt), na.rm = TRUE),
y = rowMeans(select(., qsec:am), na.rm = TRUE),
z = rowMeans(select(., gear:carb), na.rm = TRUE))
or或者
mtcars %>% rowwise() %>% mutate(w = mean(mpg:disp, na.rm = TRUE),
x = mean(hp:wt, na.rm = TRUE),
y = mean(qsec:am, na.rm = TRUE),
z = mean(gear:carb, na.rm = TRUE))
# Note: this one produced an error with my own data
The goal is to compute the means of different scales in a data frame from a single call.目标是通过一次调用计算数据框中不同尺度的均值。 As you can see, the rowMeans
, select
, and na.rm
arguments repeat several times (imagine I have several more variables than for this example).如您所见, rowMeans
、 select
和na.rm
arguments 重复了几次(假设我有比这个例子更多的变量)。
I was trying to come up with an across()
solution,我试图想出一个across()
解决方案,
mtcars %>% mutate(across(mpg:carb, mean, .names = "mean_{col}"))
But it doesn't produce the correct outcome because I don't see how to specify different column arguments for w:z
.但它不会产生正确的结果,因为我看不到如何为w:z
指定不同的列 arguments。 Using the c_across
from the documentation example and we are back to repeating code:使用文档示例中的c_across
,我们又回到了重复代码:
mtcars %>% rowwise() %>% mutate(w = mean(c_across(mpg:disp), na.rm = TRUE),
x = mean(c_across(hp:wt), na.rm = TRUE),
y = mean(c_across(qsec:am), na.rm = TRUE),
z = mean(c_across(gear:carb), na.rm = TRUE))
I am tempted to resort to lapply
or a custom function but I feel like it would be defeating the purpose of adapting to dplyr
and the new across()
argument.我很想求助于lapply
或自定义 function,但我觉得这会破坏适应dplyr
和新的across()
参数的目的。
Edit: To clarify, I want to avoid calling rowMeans
, select
, and na.rm
more than once.编辑:澄清一下,我想避免多次调用rowMeans
、 select
和na.rm
。
We don't need rowwise
, instead use select
with rowMeans
which is vectorized.我们不需要rowwise
,而是使用带有矢量化的select
的rowMeans
。 In order to make this easier, a function can be created为了使这更容易,可以创建一个 function
f1 <- function(dat, nm1) {
dat %>%
select({{nm1}}) %>%
rowMeans(na.rm = TRUE)
}
mtcars %>% mutate(w = f1(dat = ., nm1 = mpg:disp),
x = f1(dat = ., nm1 = hp:wt),
y = f1(dat = ., nm1 = qsec:am),
z = f1(dat = ., nm1= gear:carb) )
Use a custom function (but organize it a bit differently to reduce repeating code)使用自定义 function (但组织它有点不同以减少重复代码)
mm <- function(data, new_col, cols_to_mut) {
data %>%
mutate(
{{ new_col }} := mean(c_across({{ cols_to_mut }}), na.rm=TRUE)
)
}
mtcars %>%
rowwise %>%
mm(w, mpg:disp) %>%
mm(x, hp:wt) %>%
mm(y, qsec:am) %>%
mm(z, gear:carb) %>%
ungroup
Consider using purrr::reduce2
to avoid the repetition考虑使用purrr::reduce2
来避免重复
mtcars %>%
reduce2(
c("w","x", "y", "z"),
c("mpg:disp", "hp:wt","qsec:am","gear:carb"),
~ ..1 %>% rowwise %>% mutate(!!..2 := mean(c_across(!!rlang::parse_expr(..3)), na.rm=TRUE)),
.init = .)
# A tibble: 32 x 15
# Rowwise:
mpg cyl disp hp drat wt qsec vs am gear carb w x y z
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 62.3 38.8 5.82 4
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 62.3 38.9 6.01 4
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 44.9 33.1 6.87 2.5
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 95.1 38.8 6.81 2
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 129. 60.5 5.67 2.5
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 83.0 37.1 7.07 2
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 127. 83.9 5.28 3.5
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 58.4 23.0 7 3
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 55.9 34.0 7.97 3
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 64.3 43.5 6.43 4
# ... with 22 more rows
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.