[英]Merge Two Lists of Data Frames by ID and DATE in R
I need to merge two lists of data frames by two key variables, ID and DATE. 我需要通过两个关键变量ID和DATE合并两个数据帧列表。 Here is an example of the data that I have: 这是我拥有的数据的示例:
names1 <- c("df1", "df2")
mydf1 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02- 29"), SCORE =c(677,635))
mydf2 <- data.frame(ID=c(22319, 22319), DATE=c("2011-09-30","2011-10-31"), SCORE = c(621,630))
list1 <- list(mydf1,mydf2)
names(list1) <- names1
names2 <- c("df_auto1", "df_auto2")
mydf_auto1 <- data.frame(ID=c(22319, 22319),DATE=c("2011-09-30","2011-10-31") , Fprice =c(8708,8708))
mydf_auto2 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02-29"), Fprice = c(NA,6543))
list2 <- list(mydf_auto1,mydf_auto2)
names(list2) <- names2
I tried to use Map function but the output I got is messed up. 我尝试使用Map函数,但输出混乱。 Here is what I tried to do: 这是我尝试做的事情:
V <-Map(merge, list1, list2,MoreArgs=list(by=c('ID','DATE'), all=TRUE))
for (i in seq_along(V)) {
write.csv(V[[i]], paste0("merge_",i, ".csv"))
}
As the final output, I'd like to get one dataframe with ID = 115477 and fully populated variables such as DATE, SCORE and Fprice; 作为最终输出,我想获得一个ID = 115477的数据帧,并填充完整的变量,例如DATE,SCORE和Fprice; another dataframe with ID = 22319 and fully populated as well. 另一个ID为22319并已完全填充的数据框。 For example, for ID = 115477 I'd like to get: 例如,对于ID = 115477,我想获得:
ID DATE SCORE Fprice
115477 2012-01-31 677 NA
115477 2012-02-29 635 6543
Does anyone have any idea of what I am doing wrong? 有人知道我在做什么错吗? Thank you for your help. 谢谢您的帮助。
Here is a tidyverse
approach: 这是一个tidyverse
方法:
library(tidyverse);
list(bind_rows(list1), bind_rows(list2)) %>%
reduce(function(x, y) full_join(x, y, by = c("ID", "DATE"))) %>%
filter(ID %in% c(115477))
# ID DATE SCORE Fprice
#1 115477 2012-01-31 677 NA
#2 115477 2012-02-29 635 6543
Explanation: For each list
we bind rows into a single data.frame
; 说明:对于每个list
我们将行绑定到单个data.frame
; we collect the two collapsed data.frame
s in a list
and then perform an outer join by "ID"
and "DATE"
; 我们将两个折叠的data.frame
收集到一个list
,然后通过"ID"
和"DATE"
执行外部data.frame
; we use dplyr::filter
to pull out the rows of interest (here ID==115477
). 我们使用dplyr::filter
提取感兴趣的行(此处ID==115477
)。
Conduct the merge()
inside of mapply()
. 进行merge()
内mapply()
The end result is a list containing two data frames, each one the result of j th element in list2
being outer joined onto the i th element in list1
. 最终结果是一个包含两个数据帧的列表,每个数据帧的结果是list2
中的第 j 个元素外部连接到list1
第 i 个元素。
Note: There was a typo in the second DATE
element within mydf1
that is corrected below. 注意: mydf1
中第二个DATE
元素中有一个错字,下面对此进行了更正。 My answer depends on the contents of list1
and list2
possessing data frames that contain the same ID
value, in the same order. 我的回答取决于list1
和list2
的内容,这些内容具有按相同顺序包含相同ID
值的数据帧。 As the OP has it arranged, mydf_auto2
is set to be merged onto mydf1
; 按照OP的安排,将mydf_auto2
设置为合并到mydf1
; whereas mydf_auto2
should be merged onto mydf2
based on these two data frames sharing the same ID
value. 而mydf_auto2
应合并到mydf2
基于共享相同的这两个数据帧ID
值。 I revise the ordering within list2
to produce the desired output. 我修改list2
内的顺序以产生所需的输出。
# create first list of data frames
names1 <- c("df1", "df2")
# note the extra spacing in "2012-02-29" has been corrected
mydf1 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02-29"), SCORE =c(677,635))
mydf2 <- data.frame(ID=c(22319, 22319), DATE=c("2011-09-30","2011-10-31"), SCORE = c(621,630))
list1 <- list(mydf1,mydf2)
names(list1) <- names1
# create second list of data frames
names2 <- c("df_auto1", "df_auto2")
# here is where I relabel the data frames
# to sync with `mydf1` and `mydf2` based on
# the `ID` values contained in `mydf_auto1` and `mydf_auto2`
mydf_auto1 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02-29"), Fprice = c(NA,6543))
mydf_auto2 <- data.frame(ID=c(22319, 22319),DATE=c("2011-09-30","2011-10-31") , Fprice =c(8708,8708))
list2 <- list(mydf_auto1,mydf_auto2)
names(list2) <- names2
# merge the list of data frames together
merged.list.of.dfs <-
mapply( FUN = function( i, j )
merge( x = i
, y = j
, by = c( "ID", "DATE" )
, all = TRUE )
, list1
, list2
, SIMPLIFY = FALSE )
# view results
merged.list.of.dfs
# $df1
# ID DATE SCORE Fprice
# 3 115477 2012-01-31 677 NA
# 4 115477 2012-02-29 635 6543
#
# $df2
# ID DATE SCORE Fprice
# 1 22319 2011-09-30 621 8708
# 2 22319 2011-10-31 630 8708
# end of script #
It would be easier for you to do a merge
, then separately extract the IDs you want 您进行merge
会更容易,然后分别提取所需的ID
names1 <- c("df1", "df2")
mydf1 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02-29"), SCORE =c(677,635))
mydf2 <- data.frame(ID=c(22319, 22319), DATE=c("2011-09-30","2011-10-31"), SCORE = c(621,630))
# Note the change to use of rbind instead of list
list1 <- rbind(mydf1, mydf2)
names2 <- c("df_auto1", "df_auto2")
mydf_auto1 <- data.frame(ID=c(22319, 22319),DATE=c("2011-09-30","2011-10-31") , Fprice =c(8708,8708))
mydf_auto2 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02-29"), Fprice = c(NA,6543))
list2 <- rbind(mydf_auto1,mydf_auto2)
df <- merge(list1, list2, by = c("ID", "DATE"))
df[df$ID == 115477,]
df[df$ID == 22319, ]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.