简体   繁体   中英

Keep the column names of a data.table in a user-defined function

Let's say that I have the following data.table object

library(data.table)
dt <- data.table(
  x = c(1, 2, 3, 4, 5),
  y = c(1, 1, 3, 4, 5),
  z = c(1, 1, 1, 4, 5)
)

I want to be able to count the number of unique values of any stats, raised it to a power y and return it in a data.table , keeping the name of the stats.

I want to do something like the following

foo <- function(stats, y){
  lapply(stats, function(stat){length(unique(stat))^y})
}

dt[, foo(.(x, y), 2)]
##     V1 V2
##  1: 25 16

but I expect the output to be

dt[, foo(.(x, y), 2)]
##      x  y
##  1: 25 16

Note that doing this

dt[, foo(.(x=x, y=y), 2)]
##      x  y
##  1: 25 16

or this

dt[, foo(data.table(x, y), 2)]
##      x  y
##  1: 25 16

will work, but I think the syntax I suggested earlier looks better. Is it possible to tweak the foo function to do so or I have to, in some way, tweak the .( function directly in the data.table package?

Here are two potential workarounds. The first one is what you are requesting:

foo <- function(stat, x){
  DF <- lapply(stat, function(stat2){length(unique(stat2))^x})
  names(DF) <- sapply(substitute(stat)[-1], deparse)
  return(DF)
}

dt[, foo(.(x, y), 2)]
    x  y
1: 25 16

This one I think is probably just as user friendly and might be more powerful. If you asking about data.table , you should try to utilize its strengths.

foo2 <- function(DT, exponent, SD_cols , by_v = NULL){

  DT[,
     lapply(.SD, function(stat) {length(unique(stat))^exponent}),
     .SDcols = SD_cols,
     by = by_v]
}

foo2(dt, 2, c('x','y'), by_v = 'z')
   z x y
1: 1 9 4
2: 4 1 1
3: 5 1 1

foo2(dt, 2, c('x', 'y'))
    x  y
1: 25 16

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM