簡體   English   中英

使用R,data.table,條件求和列

[英]Using R, data.table, conditionally sum columns

我有一個類似於此的數據表(除了它有150列和大約500萬行):

set.seed(1)
dt <- data.table(ID=1:10, Status=c(rep("OUT",2),rep("IN",2),"ON",rep("OUT",2),rep("IN",2),"ON"), 
             t1=round(rnorm(10),1), t2=round(rnorm(10),1), t3=round(rnorm(10),1), 
             t4=round(rnorm(10),1), t5=round(rnorm(10),1), t6=round(rnorm(10),1),
             t7=round(rnorm(10),1),t8=round(rnorm(10),1))

哪個輸出:

    ID Status   t1   t2   t3   t4   t5   t6   t7   t8
 1:  1    OUT -0.6  1.5  0.9  1.4 -0.2  0.4  2.4  0.5
 2:  2    OUT  0.2  0.4  0.8 -0.1 -0.3 -0.6  0.0 -0.7
 3:  3     IN -0.8 -0.6  0.1  0.4  0.7  0.3  0.7  0.6
 4:  4     IN  1.6 -2.2 -2.0 -0.1  0.6 -1.1  0.0 -0.9
 5:  5     ON  0.3  1.1  0.6 -1.4 -0.7  1.4 -0.7 -1.3
 6:  6    OUT -0.8  0.0 -0.1 -0.4 -0.7  2.0  0.2  0.3
 7:  7    OUT  0.5  0.0 -0.2 -0.4  0.4 -0.4 -1.8 -0.4
 8:  8     IN  0.7  0.9 -1.5 -0.1  0.8 -1.0  1.5  0.0
 9:  9     IN  0.6  0.8 -0.5  1.1 -0.1  0.6  0.2  0.1
10: 10     ON -0.3  0.6  0.4  0.8  0.9 -0.1  2.2 -0.6

使用data.table,我想添加一個名為Total的新列(使用:=),其中包含以下內容:

對於每一行,

如果Status = OUT,則匯總列t1:t4和t8

如果Status = IN,則將列t5,t6,t8相加

如果Status = ON,則匯總列t1:t3和t6:t8

最終輸出應如下所示:

    ID Status   t1   t2   t3   t4   t5   t6   t7   t8  Total
 1:  1    OUT -0.6  1.5  0.9  1.4 -0.2  0.4  2.4  0.5   3.7
 2:  2    OUT  0.2  0.4  0.8 -0.1 -0.3 -0.6  0.0 -0.7   0.6
 3:  3     IN -0.8 -0.6  0.1  0.4  0.7  0.3  0.7  0.6   1.6
 4:  4     IN  1.6 -2.2 -2.0 -0.1  0.6 -1.1  0.0 -0.9  -1.4
 5:  5     ON  0.3  1.1  0.6 -1.4 -0.7  1.4 -0.7 -1.3   1.4
 6:  6    OUT -0.8  0.0 -0.1 -0.4 -0.7  2.0  0.2  0.3  -1.0
 7:  7    OUT  0.5  0.0 -0.2 -0.4  0.4 -0.4 -1.8 -0.4  -0.5
 8:  8     IN  0.7  0.9 -1.5 -0.1  0.8 -1.0  1.5  0.0  -0.2
 9:  9     IN  0.6  0.8 -0.5  1.1 -0.1  0.6  0.2  0.1   0.6
10: 10     ON -0.3  0.6  0.4  0.8  0.9 -0.1  2.2 -0.6   2.2

我是data.table(目前使用的是1.9.6版)的新手,我想嘗試使用高效的data.table語法來解決問題。

我認為按照評論中的建議逐一進行,完全沒問題,但您也可以創建一個查找表:

cond = data.table(Status = c("OUT", "IN", "ON"),
                  cols = Map(paste0, 't', list(c(1:4, 8), c(5,6,8), c(1:3, 6:8))))
#   Status              cols
#1:    OUT    t1,t2,t3,t4,t8
#2:     IN          t5,t6,t8
#3:     ON t1,t2,t3,t6,t7,t8

dt[cond, Total := Reduce(`+`, .SD[, cols[[1]], with = F]), on = 'Status', by = .EACHI]
#    ID Status   t1   t2   t3   t4   t5   t6   t7   t8 Total
# 1:  1    OUT -0.6  1.5  0.9  1.4 -0.2  0.4  2.4  0.5   3.7
# 2:  2    OUT  0.2  0.4  0.8 -0.1 -0.3 -0.6  0.0 -0.7   0.6
# 3:  3     IN -0.8 -0.6  0.1  0.4  0.7  0.3  0.7  0.6   1.6
# 4:  4     IN  1.6 -2.2 -2.0 -0.1  0.6 -1.1  0.0 -0.9  -1.4
# 5:  5     ON  0.3  1.1  0.6 -1.4 -0.7  1.4 -0.7 -1.3   1.4
# 6:  6    OUT -0.8  0.0 -0.1 -0.4 -0.7  2.0  0.2  0.3  -1.0
# 7:  7    OUT  0.5  0.0 -0.2 -0.4  0.4 -0.4 -1.8 -0.4  -0.5
# 8:  8     IN  0.7  0.9 -1.5 -0.1  0.8 -1.0  1.5  0.0  -0.2
# 9:  9     IN  0.6  0.8 -0.5  1.1 -0.1  0.6  0.2  0.1   0.6
#10: 10     ON -0.3  0.6  0.4  0.8  0.9 -0.1  2.2 -0.6   2.2

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM