简体   繁体   中英

Using R, data.table, conditionally sum columns

I have a data table similar to this (except it has 150 columns and about 5 million rows):

set.seed(1)
dt <- data.table(ID=1:10, Status=c(rep("OUT",2),rep("IN",2),"ON",rep("OUT",2),rep("IN",2),"ON"), 
             t1=round(rnorm(10),1), t2=round(rnorm(10),1), t3=round(rnorm(10),1), 
             t4=round(rnorm(10),1), t5=round(rnorm(10),1), t6=round(rnorm(10),1),
             t7=round(rnorm(10),1),t8=round(rnorm(10),1))

which outputs:

    ID Status   t1   t2   t3   t4   t5   t6   t7   t8
 1:  1    OUT -0.6  1.5  0.9  1.4 -0.2  0.4  2.4  0.5
 2:  2    OUT  0.2  0.4  0.8 -0.1 -0.3 -0.6  0.0 -0.7
 3:  3     IN -0.8 -0.6  0.1  0.4  0.7  0.3  0.7  0.6
 4:  4     IN  1.6 -2.2 -2.0 -0.1  0.6 -1.1  0.0 -0.9
 5:  5     ON  0.3  1.1  0.6 -1.4 -0.7  1.4 -0.7 -1.3
 6:  6    OUT -0.8  0.0 -0.1 -0.4 -0.7  2.0  0.2  0.3
 7:  7    OUT  0.5  0.0 -0.2 -0.4  0.4 -0.4 -1.8 -0.4
 8:  8     IN  0.7  0.9 -1.5 -0.1  0.8 -1.0  1.5  0.0
 9:  9     IN  0.6  0.8 -0.5  1.1 -0.1  0.6  0.2  0.1
10: 10     ON -0.3  0.6  0.4  0.8  0.9 -0.1  2.2 -0.6

Using data.table, I would like to add a new column (using :=) called Total that would contain the following:

For each row,

if Status=OUT, sum columns t1:t4 and t8

if Status=IN, sum columns t5,t6,t8

if Status=ON, sum columns t1:t3 and t6:t8

The final output should look like this:

    ID Status   t1   t2   t3   t4   t5   t6   t7   t8  Total
 1:  1    OUT -0.6  1.5  0.9  1.4 -0.2  0.4  2.4  0.5   3.7
 2:  2    OUT  0.2  0.4  0.8 -0.1 -0.3 -0.6  0.0 -0.7   0.6
 3:  3     IN -0.8 -0.6  0.1  0.4  0.7  0.3  0.7  0.6   1.6
 4:  4     IN  1.6 -2.2 -2.0 -0.1  0.6 -1.1  0.0 -0.9  -1.4
 5:  5     ON  0.3  1.1  0.6 -1.4 -0.7  1.4 -0.7 -1.3   1.4
 6:  6    OUT -0.8  0.0 -0.1 -0.4 -0.7  2.0  0.2  0.3  -1.0
 7:  7    OUT  0.5  0.0 -0.2 -0.4  0.4 -0.4 -1.8 -0.4  -0.5
 8:  8     IN  0.7  0.9 -1.5 -0.1  0.8 -1.0  1.5  0.0  -0.2
 9:  9     IN  0.6  0.8 -0.5  1.1 -0.1  0.6  0.2  0.1   0.6
10: 10     ON -0.3  0.6  0.4  0.8  0.9 -0.1  2.2 -0.6   2.2

I am fairly new to data.table (currently using version 1.9.6) and would like to try for a solution using efficient data.table syntax.

I think doing it one by one, as suggested in comments, is perfectly fine, but you can also create a lookup table:

cond = data.table(Status = c("OUT", "IN", "ON"),
                  cols = Map(paste0, 't', list(c(1:4, 8), c(5,6,8), c(1:3, 6:8))))
#   Status              cols
#1:    OUT    t1,t2,t3,t4,t8
#2:     IN          t5,t6,t8
#3:     ON t1,t2,t3,t6,t7,t8

dt[cond, Total := Reduce(`+`, .SD[, cols[[1]], with = F]), on = 'Status', by = .EACHI]
#    ID Status   t1   t2   t3   t4   t5   t6   t7   t8 Total
# 1:  1    OUT -0.6  1.5  0.9  1.4 -0.2  0.4  2.4  0.5   3.7
# 2:  2    OUT  0.2  0.4  0.8 -0.1 -0.3 -0.6  0.0 -0.7   0.6
# 3:  3     IN -0.8 -0.6  0.1  0.4  0.7  0.3  0.7  0.6   1.6
# 4:  4     IN  1.6 -2.2 -2.0 -0.1  0.6 -1.1  0.0 -0.9  -1.4
# 5:  5     ON  0.3  1.1  0.6 -1.4 -0.7  1.4 -0.7 -1.3   1.4
# 6:  6    OUT -0.8  0.0 -0.1 -0.4 -0.7  2.0  0.2  0.3  -1.0
# 7:  7    OUT  0.5  0.0 -0.2 -0.4  0.4 -0.4 -1.8 -0.4  -0.5
# 8:  8     IN  0.7  0.9 -1.5 -0.1  0.8 -1.0  1.5  0.0  -0.2
# 9:  9     IN  0.6  0.8 -0.5  1.1 -0.1  0.6  0.2  0.1   0.6
#10: 10     ON -0.3  0.6  0.4  0.8  0.9 -0.1  2.2 -0.6   2.2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM