簡體   English   中英

R 數據表:更新連接

[英]R data table: update join

假設我有兩個數據表:

X <- data.table(id = 1:5, L = letters[1:5])

   id L
1:  1 a
2:  2 b
3:  3 c
4:  4 d
5:  5 e

Y <- data.table(id = 3:5, L = c(NA, "g", "h"), N = c(10, NA, 12))

   id  L  N
1:  3 NA 10
2:  4  g NA
3:  5  h 12

是否可以使用數據表內置函數通過idXY進行左外連接? 如果沒有,我想構建一個具有以下預期輸出的函數(例如leftOuterJoin ):

leftOuterJoin(X, Y, on = "id")

   id  L  N
1:  1  a NA
2:  2  b NA
3:  3 NA 10
4:  4  g NA
5:  5  h 12

我試過沒有成功:

X[Y, on = "id"]

   id L i.L  N
1:  3 c  NA 10
2:  4 d   g NA
3:  5 e   h 12

我也試過這個,這幾乎是我正在尋找的:

setkey(X, id)
setkey(Y, id)
merge(X, Y, all.x = TRUE)

   id L.x L.y  N
1:  1   a  NA NA
2:  2   b  NA NA
3:  3   c  NA 10
4:  4   d   g NA
5:  5   e   h 12

這是一個更新連接:

library(data.table)
X <- data.table(id = 1:5, L = letters[1:5])
Y <- data.table(id = 3:5, L = c(NA, "g", "h"), N = c(10, NA, 12))
X[Y, on=.(id), c("L", "N"):=.(i.L, i.N)][]
#    id  L  N
# 1:  1  a NA
# 2:  2  b NA
# 3:  3 NA 10
# 4:  4  g NA
# 5:  5  h 12

給你想要的結果。
在這里,我找到了多列的解決方案:

library(data.table)
X <- data.table(id = 1:5, L = letters[1:5])
Y <- data.table(id = 3:5, L = c(NA, "g", "h"), N = c(10, NA, 12))

X[Y, on=.(id), names(Y)[-1]:=mget(paste0("i.", names(Y)[-1]))]

另一種變體:

n <- names(Y)
X[Y, on=.(id), (n):=mget(paste0("i.", n))]

我可能遺漏了一些東西,如果有更好的解決方案,請糾正我。 我通常喜歡為這些東西寫函數。

這里有一個:目標是擁有所有可用的可能性。 加入並更新加入變量,使用其他變量名......

> update.DT <- function(DATA1, DATA2, join.variable, overwrite.variable, overwrite.with.variable) {
+       
+       DATA1[DATA2, c(overwrite.variable) := mget(p0("i.", overwrite.with.variable)), on = join.variable][]
+       
+     }
> X <- X2 <- X3 <- data.table(id = 1:5, L = letters[1:5], PS = rep(59, 5))
> Y <- data.table(id = 3:5, id2 = 11:13, L = c("z", "g", "h"), PS = rep(61, 3))
> X
   id L PS
1:  1 a 59
2:  2 b 59
3:  3 c 59
4:  4 d 59
5:  5 e 59
> Y
   id id2 L PS
1:  3  11 z 61
2:  4  12 g 61
3:  5  13 h 61
> update.DT(DATA1 = X, DATA2 = Y, join.variable = "id", overwrite.variable = c("L"), overwrite.with.variable = c("L"))
   id L PS
1:  1 a 59
2:  2 b 59
3:  3 z 59
4:  4 g 59
5:  5 h 59
> update.DT(DATA1 = X2, DATA2 = Y, join.variable = "id", overwrite.variable = c("L", "PS"), overwrite.with.variable = c("L", "PS"))
   id L PS
1:  1 a 59
2:  2 b 59
3:  3 z 61
4:  4 g 61
5:  5 h 61
> update.DT(DATA1 = X2, DATA2 = Y, join.variable = "id", overwrite.variable = c("L", "PS", "id"), overwrite.with.variable = c("L", "PS", "id2"))
   id L PS
1:  1 a 59
2:  2 b 59
3: 11 z 61
4: 12 g 61
5: 13 h 61

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM