[英]Aggregate by string column name in R
I would like to group data in a data.frame by two columns and then sum a specific third column. 我想将data.frame中的数据分组为两列,然后对特定的第三列求和。 For example:
例如:
> aggregate(mpg~gear+cyl, data=mtcars, FUN=sum)
gear cyl mpg
1 3 4 21.5
2 4 4 215.4
3 5 4 56.4
4 3 6 39.5
5 4 6 79.0
6 5 6 19.7
7 3 8 180.6
8 5 8 30.8
Now, I need to do this several times for different columns. 现在,我需要为不同的列多次执行此操作。 So I would like to write a function which generalizes this.
所以我想编写一个概括它的函数。 It take the data.frame and one of the columns (to keep things simple) and does the same thing.
它采用data.frame和其中一个列(为了简单起见)并做同样的事情。
agg.data <- function(df, colname) {
aggregate(mpg~gear+colname, data=df, FUN=sum)
}
Running this will produce: 运行这将产生:
Error in eval(expr, envir, enclos) : object 'colname' not found
How can I pass in the value of colname to aggregate? 如何将colname的值传递给聚合?
Paste together a string representation of your formula, and give that string as an argument to formula()... 将公式的字符串表示粘贴在一起,并将该字符串作为参数传递给formula()...
agg.data <- function(df, colname) {
aggregate(formula(paste0("mpg~gear+", colname)), data=df, FUN=sum)
}
> agg.data(mtcars, "cyl")
gear cyl mpg
1 3 4 21.5
2 4 4 215.4
3 5 4 56.4
4 3 6 39.5
5 4 6 79.0
6 5 6 19.7
7 3 8 180.6
8 5 8 30.8
Using data.table
: 使用
data.table
:
fun.dt <- function(dt, col) {
dt[, .(mpg=sum(mpg)), by=c("gear", col)]
}
require(data.table)
dt = as.data.table(mtcars)
fun.dt(dt, "cyl")
# gear cyl mpg
# 1: 4 6 79.0
# 2: 4 4 215.4
# 3: 3 6 39.5
# 4: 3 8 180.6
# 5: 3 4 21.5
# 6: 5 4 56.4
# 7: 5 8 30.8
# 8: 5 6 19.7
The by
expression in data.tables can also take a character vector of column names in addition to lists of columns/expressions. 除了列/表达式列表之外, data.tables中的
by
表达式还可以采用列名的字符向量。 We can simply provide a character vector to the by
argument. 我们可以简单地为
by
参数提供一个字符向量。
You can easily use the "normal" aggregate
interface (ie not the formula interface) to supply column names in variables. 您可以轻松使用“常规”
aggregate
接口(即不是公式接口)来提供变量中的列名称。 The syntax is slightly different but still easy enough and doesn't require pasting: 语法稍有不同,但仍然很容易,不需要粘贴:
agg.data2 <- function(df, colname) {
aggregate(df[["mpg"]], list(df[["gear"]], df[[colname]]), FUN=sum)
}
agg.data2(mtcars, "cyl")
# Group.1 Group.2 x
#1 3 4 21.5
#2 4 4 215.4
#3 5 4 56.4
#4 3 6 39.5
#5 4 6 79.0
#6 5 6 19.7
#7 3 8 180.6
#8 5 8 30.8
Here's the dplyr equivalent: 这是dplyr的等价物:
library(dplyr)
agg.data.dplyr <- function(df, colname) {
df %>%
group_by_(.dots = c("gear", colname)) %>%
summarise(sum = sum(mpg)) %>%
ungroup()
}
agg.data.dplyr(mtcars, "cyl")
You can also pass an unquoted column name using deparse
and substitute
您还可以使用
deparse
和substitute
传递未加引号的列名
agg.data <- function(df, colname) {
aggregate(df$mpg, list(df$gear, df[, deparse(substitute(colname))]), FUN=sum)
}
agg.data(mtcars, cyl)
# Group.1 Group.2 x
# 1 3 4 21.5
# 2 4 4 215.4
# 3 5 4 56.4
# 4 3 6 39.5
# 5 4 6 79.0
# 6 5 6 19.7
# 7 3 8 180.6
# 8 5 8 30.8
You can also do this in the style of ggplot
or with
that allows you to just write the colnames as they are without passing a string by using substitute
. 您也可以使用
ggplot
的样式或者with
它来允许您只是按原样编写类名,而不使用substitute
传递字符串。
agg.data3 = function (df, colname){
colname = substitute(colname)
colname = as.character(colname)
aggregate(formula(paste0("mpg~gear+", colname)), data=mtcars, FUN=sum)
}
usage 用法
agg.data3(cars, cyl)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.