I have a data table similar to this (except it has 150 columns and about 5 million rows):
set.seed(1)
dt <- data.table(ID=1:10, Status=c(rep("OUT",2),rep("IN",2),"ON",rep("OUT",2),rep("IN",2),"ON"),
t1=round(rnorm(10),1), t2=round(rnorm(10),1), t3=round(rnorm(10),1),
t4=round(rnorm(10),1), t5=round(rnorm(10),1), t6=round(rnorm(10),1),
t7=round(rnorm(10),1),t8=round(rnorm(10),1))
which outputs:
ID Status t1 t2 t3 t4 t5 t6 t7 t8
1: 1 OUT -0.6 1.5 0.9 1.4 -0.2 0.4 2.4 0.5
2: 2 OUT 0.2 0.4 0.8 -0.1 -0.3 -0.6 0.0 -0.7
3: 3 IN -0.8 -0.6 0.1 0.4 0.7 0.3 0.7 0.6
4: 4 IN 1.6 -2.2 -2.0 -0.1 0.6 -1.1 0.0 -0.9
5: 5 ON 0.3 1.1 0.6 -1.4 -0.7 1.4 -0.7 -1.3
6: 6 OUT -0.8 0.0 -0.1 -0.4 -0.7 2.0 0.2 0.3
7: 7 OUT 0.5 0.0 -0.2 -0.4 0.4 -0.4 -1.8 -0.4
8: 8 IN 0.7 0.9 -1.5 -0.1 0.8 -1.0 1.5 0.0
9: 9 IN 0.6 0.8 -0.5 1.1 -0.1 0.6 0.2 0.1
10: 10 ON -0.3 0.6 0.4 0.8 0.9 -0.1 2.2 -0.6
Using data.table, I would like to add a new column (using :=) called Total that would contain the following:
For each row,
if Status=OUT, sum columns t1:t4 and t8
if Status=IN, sum columns t5,t6,t8
if Status=ON, sum columns t1:t3 and t6:t8
The final output should look like this:
ID Status t1 t2 t3 t4 t5 t6 t7 t8 Total
1: 1 OUT -0.6 1.5 0.9 1.4 -0.2 0.4 2.4 0.5 3.7
2: 2 OUT 0.2 0.4 0.8 -0.1 -0.3 -0.6 0.0 -0.7 0.6
3: 3 IN -0.8 -0.6 0.1 0.4 0.7 0.3 0.7 0.6 1.6
4: 4 IN 1.6 -2.2 -2.0 -0.1 0.6 -1.1 0.0 -0.9 -1.4
5: 5 ON 0.3 1.1 0.6 -1.4 -0.7 1.4 -0.7 -1.3 1.4
6: 6 OUT -0.8 0.0 -0.1 -0.4 -0.7 2.0 0.2 0.3 -1.0
7: 7 OUT 0.5 0.0 -0.2 -0.4 0.4 -0.4 -1.8 -0.4 -0.5
8: 8 IN 0.7 0.9 -1.5 -0.1 0.8 -1.0 1.5 0.0 -0.2
9: 9 IN 0.6 0.8 -0.5 1.1 -0.1 0.6 0.2 0.1 0.6
10: 10 ON -0.3 0.6 0.4 0.8 0.9 -0.1 2.2 -0.6 2.2
I am fairly new to data.table (currently using version 1.9.6) and would like to try for a solution using efficient data.table syntax.
I think doing it one by one, as suggested in comments, is perfectly fine, but you can also create a lookup table:
cond = data.table(Status = c("OUT", "IN", "ON"),
cols = Map(paste0, 't', list(c(1:4, 8), c(5,6,8), c(1:3, 6:8))))
# Status cols
#1: OUT t1,t2,t3,t4,t8
#2: IN t5,t6,t8
#3: ON t1,t2,t3,t6,t7,t8
dt[cond, Total := Reduce(`+`, .SD[, cols[[1]], with = F]), on = 'Status', by = .EACHI]
# ID Status t1 t2 t3 t4 t5 t6 t7 t8 Total
# 1: 1 OUT -0.6 1.5 0.9 1.4 -0.2 0.4 2.4 0.5 3.7
# 2: 2 OUT 0.2 0.4 0.8 -0.1 -0.3 -0.6 0.0 -0.7 0.6
# 3: 3 IN -0.8 -0.6 0.1 0.4 0.7 0.3 0.7 0.6 1.6
# 4: 4 IN 1.6 -2.2 -2.0 -0.1 0.6 -1.1 0.0 -0.9 -1.4
# 5: 5 ON 0.3 1.1 0.6 -1.4 -0.7 1.4 -0.7 -1.3 1.4
# 6: 6 OUT -0.8 0.0 -0.1 -0.4 -0.7 2.0 0.2 0.3 -1.0
# 7: 7 OUT 0.5 0.0 -0.2 -0.4 0.4 -0.4 -1.8 -0.4 -0.5
# 8: 8 IN 0.7 0.9 -1.5 -0.1 0.8 -1.0 1.5 0.0 -0.2
# 9: 9 IN 0.6 0.8 -0.5 1.1 -0.1 0.6 0.2 0.1 0.6
#10: 10 ON -0.3 0.6 0.4 0.8 0.9 -0.1 2.2 -0.6 2.2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.