繁体   English   中英

使data.table命令的R用户定义函数-如何正确引用列

[英]Make R user-defined-function for data.table commands - How to refer a column properly

我有df1数据

df1 <- data.frame(id=c("A","A","A","A","B","B","B","B"),
                        year=c(2014,2014,2015,2015),
                        month=c(1,2),
                        new.employee=c(4,6,2,6,23,2,5,34))

  id year month new.employee
1  A 2014     1            4
2  A 2014     2            6
3  A 2015     1            2
4  A 2015     2            6
5  B 2014     1           23
6  B 2014     2            2
7  B 2015     1            5
8  B 2015     2           34

具有以下功能的预期结果:

library(data.table) # V1.9.6+
temp <- setDT(df1)[month == 2L, .(id, frank(-new.employee)), by = year]
df1[temp, new.employee.rank := i.V2, on = c("year", "id")]
df1
#    id year month new.employee new.employee.rank
# 1:  A 2014     1            4                 1
# 2:  A 2014     2            6                 1
# 3:  A 2015     1            2                 2
# 4:  A 2015     2            6                 2
# 5:  B 2014     1           23                 2
# 6:  B 2014     2            2                 2
# 7:  B 2015     1            5                 1
# 8:  B 2015     2           34                 1

现在,我想通过创建用户定义的函数来改变输入来进行数据挖掘,在上面的示例中该函数为new.employee。 我尝试了一些方法,但是它们没有用:

  1. 第一次尝试:

     myRank <- function(data,var) { temp <- setDT(data)[month == 2L, .(id, frank(-var)), by = year] data[temp, new.employee.rank := i.V2, on = c("year", "id")] return(data) } myRank(df1,new.employee) 

    is.data.frame(x)中的错误:找不到对象“ new.employee”

  2. 第二次尝试:

     myRank(df1,df1$new.employee) 

什么都没出现

  1. 第三次尝试:我稍微改变一下功能

     myRank <- function(data,var) { temp <- setDT(data)[month == 2L, .(id, rank(data$var)), by = year] data[temp, new.employee.rank := i.V2, on = c("year", "id")] return(data) } 

    myRank(df1,df1 $ new.employee)警告消息:1:在is.na(x)中:is.na()应用于类型为'NULL'的非(列表或向量)2:在[.data.table (setDT(data),month == 2L,。(id,rank(data $ var)),):组1的j结果的第2项是零长度,将填充2个NA以匹配其中最长的列结果:后面的组可能有类似的问题,但只报告了第一个以节省警告缓冲区的填充量3:在is.na(x)中:is.na()应用于类型为'NULL的非(列表或向量) “

我看过类似的问题,但我的R经验不足以理解这些问题。

data.table默认情况下使用非标准评估(除非您开始with = FALSE来搞乱),因此,您将需要按名称引用列或使用get 代码的另一个问题(如注释中所述)是您正在调用new.employee ,但未在df1范围之外定义。 如果您希望阻止R在将其传递到数据集之前对其求值,则可以使用deparse(substitute(var))组合将阻止求值,然后将var转换为字符串,然后可以将该字符串传递给geteval(as.name())组合(虽然在data.table范围内执行的操作完全不同,但结果相同)。 最后,在函数中使用:=后出现打印问题。 即使一切正常, return(data)也不做任何事情,您将需要通过使用附加的[]或显式调用print来强制进行print

这是一个可能的解决方案

myRank <- function(data, var) {
  var <- deparse(substitute(var)) ## <~~~ Note this
  temp <- setDT(data)[month == 2L, .(id, frank(-get(var))), by = year] ## <~~ Note the get
  data[temp, new.employee.rank := i.V2, on = c("year", "id")][] ## <~~ Note the []
}       
myRank(df1, new.employee)
#    id year month new.employee new.employee.rank
# 1:  A 2014     1            4                 1
# 2:  A 2014     2            6                 1
# 3:  A 2015     1            2                 2
# 4:  A 2015     2            6                 2
# 5:  B 2014     1           23                 2
# 6:  B 2014     2            2                 2
# 7:  B 2015     1            5                 1
# 8:  B 2015     2           34                 1

要么

myRank <- function(data, var) {
  var <- as.name(deparse(substitute(var))) ## <~~~ Note additional as.name
  temp <- setDT(data)[month == 2L, .(id, frank(-eval(var))), by = year] ## <~ Note the eval
  data[temp, new.employee.rank := i.V2, on = c("year", "id")][]
} 
myRank(df1, new.employee)
#    id year month new.employee new.employee.rank
# 1:  A 2014     1            4                 1
# 2:  A 2014     2            6                 1
# 3:  A 2015     1            2                 2
# 4:  A 2015     2            6                 2
# 5:  B 2014     1           23                 2
# 6:  B 2014     2            2                 2
# 7:  B 2015     1            5                 1
# 8:  B 2015     2           34                 1

我猜第二个选项会更快,因为它避免了从data提取整个列


附带说明,您还可以通过替换新变量名称的创建来交互

new.employee.rank := i.V2

用类似的东西

paste0("New.", var, ".rank") := i.V2 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM