[英]R data table: update join
假設我有兩個數據表:
X <- data.table(id = 1:5, L = letters[1:5])
id L
1: 1 a
2: 2 b
3: 3 c
4: 4 d
5: 5 e
Y <- data.table(id = 3:5, L = c(NA, "g", "h"), N = c(10, NA, 12))
id L N
1: 3 NA 10
2: 4 g NA
3: 5 h 12
是否可以使用數據表內置函數通過id
對X
和Y
進行左外連接? 如果沒有,我想構建一個具有以下預期輸出的函數(例如leftOuterJoin
):
leftOuterJoin(X, Y, on = "id")
id L N
1: 1 a NA
2: 2 b NA
3: 3 NA 10
4: 4 g NA
5: 5 h 12
我試過沒有成功:
X[Y, on = "id"]
id L i.L N
1: 3 c NA 10
2: 4 d g NA
3: 5 e h 12
我也試過這個,這幾乎是我正在尋找的:
setkey(X, id)
setkey(Y, id)
merge(X, Y, all.x = TRUE)
id L.x L.y N
1: 1 a NA NA
2: 2 b NA NA
3: 3 c NA 10
4: 4 d g NA
5: 5 e h 12
這是一個更新連接:
library(data.table)
X <- data.table(id = 1:5, L = letters[1:5])
Y <- data.table(id = 3:5, L = c(NA, "g", "h"), N = c(10, NA, 12))
X[Y, on=.(id), c("L", "N"):=.(i.L, i.N)][]
# id L N
# 1: 1 a NA
# 2: 2 b NA
# 3: 3 NA 10
# 4: 4 g NA
# 5: 5 h 12
給你想要的結果。
在這里,我找到了多列的解決方案:
library(data.table)
X <- data.table(id = 1:5, L = letters[1:5])
Y <- data.table(id = 3:5, L = c(NA, "g", "h"), N = c(10, NA, 12))
X[Y, on=.(id), names(Y)[-1]:=mget(paste0("i.", names(Y)[-1]))]
另一種變體:
n <- names(Y)
X[Y, on=.(id), (n):=mget(paste0("i.", n))]
我可能遺漏了一些東西,如果有更好的解決方案,請糾正我。 我通常喜歡為這些東西寫函數。
這里有一個:目標是擁有所有可用的可能性。 加入並更新加入變量,使用其他變量名......
> update.DT <- function(DATA1, DATA2, join.variable, overwrite.variable, overwrite.with.variable) {
+
+ DATA1[DATA2, c(overwrite.variable) := mget(p0("i.", overwrite.with.variable)), on = join.variable][]
+
+ }
> X <- X2 <- X3 <- data.table(id = 1:5, L = letters[1:5], PS = rep(59, 5))
> Y <- data.table(id = 3:5, id2 = 11:13, L = c("z", "g", "h"), PS = rep(61, 3))
> X
id L PS
1: 1 a 59
2: 2 b 59
3: 3 c 59
4: 4 d 59
5: 5 e 59
> Y
id id2 L PS
1: 3 11 z 61
2: 4 12 g 61
3: 5 13 h 61
> update.DT(DATA1 = X, DATA2 = Y, join.variable = "id", overwrite.variable = c("L"), overwrite.with.variable = c("L"))
id L PS
1: 1 a 59
2: 2 b 59
3: 3 z 59
4: 4 g 59
5: 5 h 59
> update.DT(DATA1 = X2, DATA2 = Y, join.variable = "id", overwrite.variable = c("L", "PS"), overwrite.with.variable = c("L", "PS"))
id L PS
1: 1 a 59
2: 2 b 59
3: 3 z 61
4: 4 g 61
5: 5 h 61
> update.DT(DATA1 = X2, DATA2 = Y, join.variable = "id", overwrite.variable = c("L", "PS", "id"), overwrite.with.variable = c("L", "PS", "id2"))
id L PS
1: 1 a 59
2: 2 b 59
3: 11 z 61
4: 12 g 61
5: 13 h 61
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.