[英]Keep the column names of a data.table in a user-defined function
Let's say that I have the following data.table
object 假设我有以下data.table
对象
library(data.table)
dt <- data.table(
x = c(1, 2, 3, 4, 5),
y = c(1, 1, 3, 4, 5),
z = c(1, 1, 1, 4, 5)
)
I want to be able to count the number of unique values of any stats, raised it to a power y
and return it in a data.table
, keeping the name of the stats. 我希望能够计算任何统计信息的唯一值的数量,将其提高到y
并返回到data.table
,并保留统计信息的名称。
I want to do something like the following 我想做以下事情
foo <- function(stats, y){
lapply(stats, function(stat){length(unique(stat))^y})
}
dt[, foo(.(x, y), 2)]
## V1 V2
## 1: 25 16
but I expect the output to be 但我希望输出是
dt[, foo(.(x, y), 2)]
## x y
## 1: 25 16
Note that doing this 请注意,这样做
dt[, foo(.(x=x, y=y), 2)]
## x y
## 1: 25 16
or this 或这个
dt[, foo(data.table(x, y), 2)]
## x y
## 1: 25 16
will work, but I think the syntax I suggested earlier looks better. 可以使用,但是我认为我之前建议的语法看起来更好。 Is it possible to tweak the foo
function to do so or I have to, in some way, tweak the .(
function directly in the data.table
package? 是否可以对foo
函数进行调整,或者我必须以某种方式直接对data.table
包中的.(
函数进行调整?
Here are two potential workarounds. 这是两个潜在的解决方法。 The first one is what you are requesting: 第一个是您要的:
foo <- function(stat, x){
DF <- lapply(stat, function(stat2){length(unique(stat2))^x})
names(DF) <- sapply(substitute(stat)[-1], deparse)
return(DF)
}
dt[, foo(.(x, y), 2)]
x y
1: 25 16
This one I think is probably just as user friendly and might be more powerful. 我认为这可能与用户友好程度相同,并且功能可能更强大。 If you asking about data.table
, you should try to utilize its strengths. 如果您询问data.table
,则应尝试利用其优势。
foo2 <- function(DT, exponent, SD_cols , by_v = NULL){
DT[,
lapply(.SD, function(stat) {length(unique(stat))^exponent}),
.SDcols = SD_cols,
by = by_v]
}
foo2(dt, 2, c('x','y'), by_v = 'z')
z x y
1: 1 9 4
2: 4 1 1
3: 5 1 1
foo2(dt, 2, c('x', 'y'))
x y
1: 25 16
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.