简体   繁体   中英

Using .SD column names in lapply() with data tables

I am trying to make an operation conditional on the name of a column in a data.table. With below example I try to illustrate what I mean. We have a DT with two columns carrot and banana. Each of these columns contains values. I want now that the carrot values are multiplied by 2 and that the banana values are divided by 2. My code, however, does not work, because names(.SD) is a vector of length 2 ( names(DT) ). is there a way I can make this work with lapply() ?

carrot <- 1:5
banana <- 1:5

DT <- data.table(carrot, banana)

DT[, lapply(.SD, function(x) if(names(.SD) == 'carrot') {x * 2} else {x / 2}), .SDcols = names(DT)]

Do you have to do it in one operation? Multiple operations is cleaner I think eg

carrot <- 1:5
banana <- 1:5

DT <- data.table(carrot, banana)

# simplest way, assigning back to original value (or new columns)
DT[, carrot := carrot*2]
DT[, banana := banana/2]

# lapply way - do it twice
DT <- data.table(carrot, banana)
cols1 <- "carrot"
cols2 <- "banana"

# forms new unassigned tables 
DT[, lapply(.SD, function(x) x*2), .SDcols=cols1]
DT[, lapply(.SD, function(x) x/2), .SDcols=cols2]

# can also assign back in to DT
DT[, (cols1) :=  lapply(.SD, function(x) x*2), .SDcols=cols1]
DT[]
DT[, (cols2) := lapply(.SD, function(x) x/2), .SDcols=cols2]
DT[]

The question/answer Access lapply index names inside FUN provided me with inspiration for a solution:

DT[, lapply(seq_along(names(.SD)),
            function(y, n, i) if(n[[i]] == 'carrot') {y[[i]] * 2} else {y[[i]] / 2},
            y = .SD,
            n = names(.SD)),
   .SDcols = names(DT)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM