簡體   English   中英

R:合並具有多個條件和多個邏輯運算符的 data.frames

[英]R: Merging data.frames with multiple conditions & multiple logical operators

今天是個好日子,

我遇到了一個具有挑戰性的問題,我想找到一種優雅的方法來:

  1. 結合兩個data.frames:

    一個。 兩個共同變量;

    灣。 一個日期變量,即如果 DATE >= START_DATE & DATE <= END_DATE;

    c。 組合代碼/ID 變量,即如果 CODE_X == CODE_ID | CODE_X == ID

這是data.frame 1:

CODE_ID = c("A01", "A10", "E01", "C01", "T01")
ID = c("A", "A", "E", "C", "T")
DATE = c("2008-07-01", "2008-07-01", "2009-08-01", "2008-09-01", "2009-10-01")
TF_1 = c("F", "F", "F", "F", "F")
D_VAR_1 = c("D_0101", "D_0101", "D_0101", "D_0101", "D_0102")

DF1 = data.frame(CODE_ID, ID, DATE, TF_1, D_VAR_1)

這是data.frame 2:

CODE_X = c("A", "A10", "E", "C", "T01")
START_DATE = c("2008-07-01", "2009-07-01", "2009-07-01", "2008-07-01", "2009-07-01")
END_DATE= c("2009-06-30", "2010-06-30", "2010-06-30", "2009-06-30", "2010-06-30")
TF_2 = c("F", "F", "F", "F", "F")
D_VAR_2 = c("D_0101", "D_0102", "D_0101", "D_0101", "D_0102")
NAME = c("ACCIDENT", "MISC ACCIDENT", "ENERGY", "CONSTRUCTION", "POLITICS")

DF2 = data.frame(CODE_X, START_DATE, END_DATE, TF_2, D_VAR_2, NAME)

我的最終 data.frame 3 如下所示:

CODE_ID = c("A01", "A10", "E01", "C01", "T01")
ID = c("A", "A", "E", "C", "T")
DATE = c("2008-07-01", "2008-07-01", "2009-08-01", "2008-09-01", "2009-10-01")
TF_1 = c("F", "F", "F", "F", "F")
D_VAR_1 = c("D_0101", "D_0101", "D_0101", "D_0101", "D_0102")
NAME = c("ACCIDENT", "MISC ACCIDENT", "ENERGY", "CONSTRUCTION", "POLITICS")

DF3 = data.frame(CODE_ID, ID, DATE, TF_1, D_VAR_1, NAME)

試試sqldf package。 它可以讓您組合數據幀,就像您正在編寫 sql 查詢一樣。 可以幫助處理更復雜的連接。

library(sqldf)


sqldf.Example <- sqldf('select DF1.*, DF2.NAME from DF1 join DF2 on (DF1.CODE_ID = DF2.CODE_X or DF1.ID = DF2.CODE_X) and DF1.DATE between DF2.START_DATE and DF2.END_DATE')

另一個使用來自data.table的非 equi 更新連接的選項:

library(data.table) #data.table_v1.12.4
setDT(DF1)
setDT(DF2)

DF1[DF2, on=.(CODE_ID=CODE_X, DATE>=START_DATE, DATE<=END_DATE), NAME := i.NAME]
DF1[DF2, on=.(ID=CODE_X, DATE>=START_DATE, DATE<=END_DATE), 
    NAME := fifelse(is.na(x.NAME), i.NAME, x.NAME)]

output:

   CODE_ID ID       DATE TF_1 D_VAR_1         NAME
1:     A01  A 2008-07-01    F  D_0101     ACCIDENT
2:     A10  A 2008-07-01    F  D_0101     ACCIDENT
3:     E01  E 2009-08-01    F  D_0101       ENERGY
4:     C01  C 2008-09-01    F  D_0101 CONSTRUCTION
5:     T01  T 2009-10-01    F  D_0102     POLITICS

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM