简体   繁体   English

通过列索引而不是名称将函数应用于data.table列的子集

[英]Apply a function to a subset of data.table columns, by column-indices instead of name

I'm trying to apply a function to a group of columns in a large data.table without referring to each one individually. 我正在尝试将函数应用于大型data.table中的一组列,而不是单独引用每个列。

a <- data.table(
  a=as.character(rnorm(5)),
  b=as.character(rnorm(5)),
  c=as.character(rnorm(5)),
  d=as.character(rnorm(5))
)
b <- c('a','b','c','d')

with the MWE above, this: 以上MWE,这个:

a[,b=as.numeric(b),with=F]

works, but this: 工作,但这:

a[,b[2:3]:=data.table(as.numeric(b[2:3])),with=F]

doesn't work. 不起作用。 What is the correct way to apply the as.numeric function to just columns 2 and 3 of a without referring to them individually. 什么是对应用正确的方式as.numeric功能仅限于列2和3 a没有提及他们的个人。

(In the actual data set there are tens of columns so it would be impractical) (在实际的数据集中有几十列,所以这是不切实际的)

The idiomatic approach is to use .SD and .SDcols 惯用的方法是使用.SD.SDcols

You can force the RHS to be evaluated in the parent frame by wrapping in () 您可以通过包装()强制在父框架​​中评估RHS

a[, (b) := lapply(.SD, as.numeric), .SDcols = b]

For columns 2:3 对于第2列:3

a[, 2:3 := lapply(.SD, as.numeric), .SDcols = 2:3]

or 要么

mysubset <- 2:3
a[, (mysubset) := lapply(.SD, as.numeric), .SDcols = mysubset]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM