简体   繁体   English

使用.SDcols在data.table中的列子集中应用函数

[英]Apply function across subset of columns in data.table with .SDcols

I want to apply a function over a subset of variables in a data.table. 我想在data.table中的变量子集上应用函数。 In this case I'm simply changing variable types. 在这种情况下,我只是改变变量类型。 I can do this a few different ways in data.table, however I'm looking for a way that does not require an intermediate assignment ( mycols in this example) and does not require me to specify the columns I want to change twice. 我可以在data.table中以几种不同的方式执行此操作,但是我正在寻找一种不需要中间分配的方法(本例中为mycols ),并且不需要我指定要更改两次的列。 Here is a simplified reproducible example: 这是一个简化的可重现的例子:

library('data.table')
n<-30
dt <- data.table(a=sample(1:5, n, replace=T),
       b=as.character(sample(seq(as.Date('2011-01-01'), as.Date('2015-01-01'), length.out=n))),
       c1235=as.character(sample(seq(as.Date('2012-01-01'), as.Date('2013-01-01'), length.out=n))),
       d7777=as.character(sample(seq(as.Date('2012-01-01'), as.Date('2013-01-01'), length.out=n)))
)

WAY 1: this works... but it's hard-coded 方式1:这有效......但它是硬编码的

mycols <- c('b', 'c1235', 'd7777')
dt1 <- dt[,(mycols):=lapply(.SD, as.Date), .SDcols=mycols]

WAY 2: this works... but I need to crate an intermediate object for it to work ( mycols ) 方式2:这有效...但我需要创建一个中间对象才能工作( mycols

mycols <- which(sapply(dt, class)=='character')
dt2 <- dt[,(mycols):=lapply(.SD, as.Date), .SDcols=mycols]

WAY 3: this works, but I need to specify this long expression twice 方式3:这有效,但我需要两次指定这个长表达式

dt3 <- dt[,(which(sapply(dt, class)=='character')):=lapply(.SD, as.Date), .SDcols=which(sapply(dt, class)=='character')]

WAY 4: this doesn't work, but I want something like this that allows me to only specify the variables that make .SDcols once. 方式4:这不起作用,但我想要这样的东西,只允许我指定使.SDcols一次的变量。 I'm looking for some way to replace (.SD):= with something that works... or chain things together. 我正在寻找一些方法来替换(.SD):=用一些(.SD):=东西......或者把东西连在一起。 Really I'd be curious to see if anyone has a method for performing what is done in WAY 1,2,3 without specifying an intermediate assignment that bloats the environment and does not require specifying the same columns twice. 真的,我很想知道是否有人有一个方法来执行在WAY 1,2,3中完成的操作,而没有指定一个膨胀环境并且不需要两次指定相同列的中间分配。

dt3 <- dt[,(.SD):=lapply(.SD, as.Date), .SDcols=which(sapply(dt, class)=='character')]

here's a one line answer... 这是一个答案......

for (j in  which(sapply(dt, class)=='character')) set(dt, i=NULL, j=j, value=as.Date(dt[[j]]))

Here's a question where Arun and Matt each prefer set with a for loop instead of using .SD 这里就是阿伦和马特每一个喜欢一个问题setfor循环,而不是使用的.SD

How to apply same function to every specified column in a data.table 如何将相同的函数应用于data.table中的每个指定列

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM