在 dplyr 总结中使用变量列名

Question

I found this question already asked but without proper answer.我发现这个问题已经被问到但没有正确的答案。 R using variable column names in summarise function in dplyr R 在汇总中使用变量列名 function 在 dplyr 中

I want to calculate the difference between two column means, but the column name should be provided by variables... So far I found only the function as.name to provide column names as text, but this somehow doesn't work here...我想计算两列均值之间的差异，但列名应由变量提供...到目前为止，我发现只有 function as.name以文本形式提供列名，但这在这里不起作用.. .

With fix column names it works.使用固定列名称它可以工作。

x <- c('a','b')
df <- group_by(data.frame(a=c(1,2,3,4), b=c(2,3,4,5), c=c(1,1,2,2)), c)
df %>% summarise(mean(a) - mean(b))

With variable columns, it doesn't work对于可变列，它不起作用

df %>% summarise(mean(x[1]) - mean(x[2]))
df %>% summarise(mean(as.name(x[1])) - mean(as.name(x[2])))

Since this was asked already 3 years ago and dplyr is under good development, I am wondering if there is an answer to this now.由于这已经是 3 年前提出的，并且dplyr正在良好开发中，我想知道现在是否有答案。

Answer 1

You can use base::get : 你可以使用base::get ：

df %>% summarise(mean(get(x[1])) - mean(get(x[2])))

# # A tibble: 2 x 2
#        c `mean(a) - mean(b)`
#    <dbl>               <dbl>
# 1     1                  -1
# 2     2                  -1

get will search in current environment by default. get将默认在当前环境中搜索。

As the error message says, mean expects a logical or numeric object, as.name returns a name: 正如错误消息所示， mean需要一个逻辑或数字对象， as.name返回一个名称：

class(as.name("a")) # [1] "name"

You could evaluate your name, that would work as well : 您可以评估您的姓名，这也可以使用：

df %>% summarise(mean(eval(as.name(x[1]))) - mean(eval(as.name(x[2]))))
# # A tibble: 2 x 2
#       c `mean(eval(as.name(x[1]))) - mean(eval(as.name(x[2])))`
#   <dbl>                                                   <dbl>
# 1     1                                                      -1
# 2     2                                                      -1

Answer 2

This is not a direct answer to your question but maybe could be useful for other people reading your post: It could be easier to use variable columns directly, like这不是您问题的直接答案，但可能对阅读您帖子的其他人有用：直接使用变量列可能更容易，例如

df %>% summarise(someName = mean(.[[1]]) - mean(.[[2]]))
############ which is the same as ############
df %>% summarise(someName = mean(.[,1,drop=T]) - mean(.[,2,drop=T]))

Note that drop=T is because when using just single square bracket the result preserves the class (in this case class( . ) = data.frame) and this isn't what we want (columns must be given in vector form to the summarise function)请注意， drop=T 是因为当仅使用单个方括号时，结果保留了 class （在本例中为 class( . ) = data.frame），这不是我们想要的（列必须以向量形式给出以进行汇总功能）

在 dplyr 总结中使用变量列名

问题描述

2 个解决方案

解决方案1
5 已采纳 2018-08-21 08:52:00

解决方案2
1 2022-04-11 01:34:09

在 dplyr 总结中使用变量列名

问题描述

2 个解决方案

解决方案1 5 已采纳 2018-08-21 08:52:00

解决方案2 1 2022-04-11 01:34:09

解决方案1
5 已采纳 2018-08-21 08:52:00

解决方案2
1 2022-04-11 01:34:09