简体   繁体   中英

Changing multiple Columns in data.table r

I am looking for a way to manipulate multiple columns in a data.table in R. As I have to address the columns dynamically as well as a second input, I wasn't able to find an answer.

The idea is to index two or more series on a certain date by dividing all values by the value of the date eg:

set.seed(132)
# simulate some data
dt <- data.table(date = seq(from = as.Date("2000-01-01"), by = "days", length.out = 10),
                 X1 = cumsum(rnorm(10)),
                 X2 = cumsum(rnorm(10)))

# set a date for the index
indexDate <- as.Date("2000-01-05")

# get the column names to be able to select the columns dynamically
cols <- colnames(dt)
cols <- cols[substr(cols, 1, 1) == "X"]

Part 1: The Easy data.frame/apply approach

df <- as.data.frame(dt)
# get the right rownumber for the indexDate
rownum <- max((1:nrow(df))*(df$date==indexDate))

# use apply to iterate over all columns
df[, cols] <- apply(df[, cols], 
                    2, 
                    function(x, i){x / x[i]}, i = rownum)

Part 2: The (fast) data.table approach So far my data.table approach looks like this:

for(nam in cols) {
  div <- as.numeric(dt[rownum, nam, with = FALSE])
  dt[ , 
     nam := dt[,nam, with = FALSE] / div,
     with=FALSE]
}

especially all the with = FALSE look not very data.table-like.

Do you know any faster/more elegant way to perform this operation?

Any idea is greatly appreciated!

One option would be to use set as this involves multiple columns. The advantage of using set is that it will avoid the overhead of [.data.table and makes it faster.

library(data.table)
for(j in cols){
  set(dt, i=NULL, j=j, value= dt[[j]]/dt[[j]][rownum])
}

Or a slightly slower option would be

dt[, (cols) :=lapply(.SD, function(x) x/x[rownum]), .SDcols=cols]

Following up on your code and the answer given by akrun, I would recommend you to use .SDcols to extract the numeric columns and lapply to loop through them. Here's how I would do it:

index <-as.Date("2000-01-05")

rownum<-max((dt$date==index)*(1:nrow(dt)))

dt[, lapply(.SD, function (i) i/i[rownum]), .SDcols = is.numeric]

Using .SDcols could be specially useful if you have a large number of numeric columns and you'd like to apply this division on all of them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM