[英]A simple reproducible example to pass arguments to data.table in a self-defined function in R
I have been googling this answer for a few hours.我已经在谷歌上搜索了这个答案几个小时。 A lot of people have asked similar questions, but I did not find either a simple enough question or a straightforward answer.很多人都问过类似的问题,但我没有找到一个足够简单的问题或一个直截了当的答案。 Here is my approach:这是我的方法:
Assume that I want to do a simple group by in data.table
:假设我想在data.table
中做一个简单的分组:
library(data.table)
mtcars = data.table(mtcars)
mtcars[,sum(mpg), gear]
# Here are the results
# gear V1
#1: 4 294.4
#2: 3 241.6
#3: 5 106.9
However, if I use a self-defined function to do this:但是,如果我使用自定义的 function 来执行此操作:
zz = function(data, var, group){
return(data[,sum(var), group])
}
zz(mtcars, mpg, gear)
I got an error message:我收到一条错误消息:
Error in eval(bysub, parent.frame(), parent.frame()): object 'gear' not found eval 中的错误(bysub、parent.frame()、parent.frame()):未找到 object 'gear'
I've tried substitute
, eval
, quote
, and other solutions, but none of them works.我尝试了substitute
、 eval
、 quote
和其他解决方案,但它们都不起作用。 I wonder if anyone could give a more straightforward solution and explanation to this.我想知道是否有人可以对此给出更直接的解决方案和解释。
Thank you and happy Halloween!谢谢你,万圣节快乐!
If we are using unquoted arguments, substitute
and eval
uate如果我们使用未引用的eval
,请substitute
并评估
zz <- function(data, var, group){
var <- substitute(var)
group <- substitute(group)
setnames(data[, sum(eval(var)), by = group],
c(deparse(group), deparse(var)))[]
# or use
# setnames(data[, sum(eval(var)), by = c(deparse(group))], 2, deparse(var))[]
}
zz(mtcars, mpg, gear)
# gear mpg
#1: 4 294.4
#2: 3 241.6
#3: 5 106.9
While not perfect, the ...
argument can be helpful:虽然不完美,但...
论点可能会有所帮助:
zz = function(dt, ...){
return(dt[...])
}
zz(mtcars, , sum(mpg), gear)
gear V1
1: 4 294.4
2: 3 241.6
3: 5 106.9
I don't really see the point in writing a function that takes unquoted arguments.我真的不明白写一个 function 没有引用 arguments 的意义。 Why not just use the data.table syntax directly?为什么不直接使用 data.table 语法呢?
If you want to write a function it makes more sense to take a character vector of column names since this is way more programmable than symbols (think about programming with setkey
versus setkeyv
or writing functions to create ggplots with aes
versus aes_string
) The downside is that the internal of the function is messy and requires eval(parse(text=.))) NSE in order for GForce to work correctly, but the function interface is more extensible.如果您想编写 function ,则采用列名的字符向量更有意义,因为这比符号更具可编程性(考虑使用setkey
与setkeyv
编程或编写函数以使用aes
与aes_string
)缺点是function 的内部很乱,需要 eval(parse(text=.))) NSE 才能使 GForce 正常工作,但 function 接口更具可扩展性。
zz = function(data, var, group){
eval(parse(text=paste0("data[,sum(",var,"),by=",group,"]")))
}
zz(mtcars, "mpg", "gear")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.