简体   繁体   中英

Apply a function to a subset of data.table columns, by column-indices instead of name

I'm trying to apply a function to a group of columns in a large data.table without referring to each one individually.

a <- data.table(
  a=as.character(rnorm(5)),
  b=as.character(rnorm(5)),
  c=as.character(rnorm(5)),
  d=as.character(rnorm(5))
)
b <- c('a','b','c','d')

with the MWE above, this:

a[,b=as.numeric(b),with=F]

works, but this:

a[,b[2:3]:=data.table(as.numeric(b[2:3])),with=F]

doesn't work. What is the correct way to apply the as.numeric function to just columns 2 and 3 of a without referring to them individually.

(In the actual data set there are tens of columns so it would be impractical)

The idiomatic approach is to use .SD and .SDcols

You can force the RHS to be evaluated in the parent frame by wrapping in ()

a[, (b) := lapply(.SD, as.numeric), .SDcols = b]

For columns 2:3

a[, 2:3 := lapply(.SD, as.numeric), .SDcols = 2:3]

or

mysubset <- 2:3
a[, (mysubset) := lapply(.SD, as.numeric), .SDcols = mysubset]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM