简体   繁体   English

如何将用户定义函数的参数传递为data.table中的列名?

[英]how to pass the argument of a user-defined function to be a column name in data.table?

How can I pass an argument to be a column name of data.table in side the function? 如何在函数旁边将参数传递为data.table的列名? For example, I have a data called data1 with columns called 'hours' and 'location'. 例如,我有一个名为data1的数据,其列名为“小时”和“位置”。 In the output, I want to find the outliers by location and named by 'hours'. 在输出中,我想按位置查找离群值,并以“小时数”命名。 I tried use substitute(y) and so forth, The output always uses 'y' as the column name. 我试过使用replace(y)等等,输出始终使用“ y”作为列名。 Could anyone help me? 有人可以帮我吗? Thank you. 谢谢。

mf<-function(data, y){
newy<-as.name(deparse(substitute(y)))
output<-data[,.(y=boxplot.stats(eval(newy))$out),by=.(location)]
return(output)
}
mf(data=data1,y=hours)

It's better to write functions which take character values for choosing columns. 最好编写使用字符值来选择列的函数。 In this case, your function can be rewritten as: 在这种情况下,您的函数可以重写为:

mf <- function(data, y){
  output <- data[, boxplot.stats(get(y))['out'], by = .(location)]
  setnames(output, 'out', y)
  return(output)
}

By using [ to subset the output of boxplot.stats , a named list with one element ( 'out' ) is returned. 通过使用[boxplot.stats的输出进行子集boxplot.stats ,将返回带有一个元素( 'out' )的命名列表。 So output will have two columns: location and out . 因此output将具有两列: locationout Then you just need to change out to be whatever was given for y . 那么你只需要改变out是什么给予了y

Example: 例:

set.seed(100)
data1 <- data.table(
  location = state.name,
  hours    = rpois(1000, 12)
)
mf(data = data1, y = 'hours')
#           location hours
#  1:       Delaware    25
#  2:        Georgia    21
#  3:          Idaho     4
#  4:  Massachusetts     5
#  5:       Missouri     7
#  6: South Carolina     5
#  7: South Carolina     6
#  8:   South Dakota    20
#  9:          Texas     5
# 10:           Utah    22

Non-standard evaluation is tricky and only worth the effort if you can get something out of it. 非标准评估是棘手的,只有在您可以从中获益的情况下,才值得付出努力。 data.table uses it for optimization behind the scenes. data.table使用它在后台进行优化。 tidyverse packages use it to allow in-database processing. tidyverse软件包使用它来允许数据库内处理。 If there's no benefit (besides not having to type a few quotation marks), there's only a cost. 如果没有好处(除了不必键入一些引号),那就只有成本了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM