简体   繁体   English

在R中按ID和DATE合并两个数据帧列表

[英]Merge Two Lists of Data Frames by ID and DATE in R

I need to merge two lists of data frames by two key variables, ID and DATE. 我需要通过两个关键变量ID和DATE合并两个数据帧列表。 Here is an example of the data that I have: 这是我拥有的数据的示例:

 names1 <- c("df1", "df2")
 mydf1 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02-   29"), SCORE =c(677,635)) 
 mydf2 <- data.frame(ID=c(22319, 22319), DATE=c("2011-09-30","2011-10-31"), SCORE = c(621,630))
 list1 <- list(mydf1,mydf2)
 names(list1) <- names1

 names2 <- c("df_auto1", "df_auto2")
 mydf_auto1 <- data.frame(ID=c(22319, 22319),DATE=c("2011-09-30","2011-10-31") , Fprice =c(8708,8708)) 
 mydf_auto2 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02-29"), Fprice = c(NA,6543))
 list2 <- list(mydf_auto1,mydf_auto2)
 names(list2) <- names2

I tried to use Map function but the output I got is messed up. 我尝试使用Map函数,但输出混乱。 Here is what I tried to do: 这是我尝试做的事情:

 V <-Map(merge, list1, list2,MoreArgs=list(by=c('ID','DATE'), all=TRUE))

 for (i in seq_along(V)) {
 write.csv(V[[i]], paste0("merge_",i, ".csv"))
 }

As the final output, I'd like to get one dataframe with ID = 115477 and fully populated variables such as DATE, SCORE and Fprice; 作为最终输出,我想获得一个ID = 115477的数据帧,并填充完整的变量,例如DATE,SCORE和Fprice; another dataframe with ID = 22319 and fully populated as well. 另一个ID为22319并已完全填充的数据框。 For example, for ID = 115477 I'd like to get: 例如,对于ID = 115477,我想获得:

  ID        DATE          SCORE    Fprice
 115477    2012-01-31     677     NA
 115477    2012-02-29     635     6543 

Does anyone have any idea of what I am doing wrong? 有人知道我在做什么错吗? Thank you for your help. 谢谢您的帮助。

Here is a tidyverse approach: 这是一个tidyverse方法:

library(tidyverse);
list(bind_rows(list1), bind_rows(list2)) %>%
    reduce(function(x, y) full_join(x, y, by = c("ID", "DATE"))) %>%
    filter(ID %in% c(115477))
#      ID       DATE SCORE Fprice
#1 115477 2012-01-31   677     NA
#2 115477 2012-02-29   635   6543

Explanation: For each list we bind rows into a single data.frame ; 说明:对于每个list我们将行绑定到单个data.frame we collect the two collapsed data.frame s in a list and then perform an outer join by "ID" and "DATE" ; 我们将两个折叠的data.frame收集到一个list ,然后通过"ID""DATE"执行外部data.frame we use dplyr::filter to pull out the rows of interest (here ID==115477 ). 我们使用dplyr::filter提取感兴趣的行(此处ID==115477 )。

Overview 概观

Conduct the merge() inside of mapply() . 进行merge()mapply()

The end result is a list containing two data frames, each one the result of j th element in list2 being outer joined onto the i th element in list1 . 最终结果是一个包含两个数据帧的列表,每个数据帧的结果是list2中的 j 元素外部连接list1 i 元素。

Note: There was a typo in the second DATE element within mydf1 that is corrected below. 注意: mydf1中第二个DATE元素中有一个错字,下面对此进行了更正。 My answer depends on the contents of list1 and list2 possessing data frames that contain the same ID value, in the same order. 我的回答取决于list1list2的内容,这些内容具有按相同顺序包含相同ID值的数据帧。 As the OP has it arranged, mydf_auto2 is set to be merged onto mydf1 ; 按照OP的安排,将mydf_auto2设置为合并到mydf1 whereas mydf_auto2 should be merged onto mydf2 based on these two data frames sharing the same ID value. mydf_auto2应合并到mydf2基于共享相同的这两个数据帧ID值。 I revise the ordering within list2 to produce the desired output. 我修改list2内的顺序以产生所需的输出。

# create first list of data frames
names1 <- c("df1", "df2")
# note the extra spacing in "2012-02-29" has been corrected
mydf1 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02-29"), SCORE =c(677,635)) 
mydf2 <- data.frame(ID=c(22319, 22319), DATE=c("2011-09-30","2011-10-31"), SCORE = c(621,630))
list1 <- list(mydf1,mydf2)
names(list1) <- names1

# create second list of data frames
names2 <- c("df_auto1", "df_auto2")
# here is where I relabel the data frames
# to sync with `mydf1` and `mydf2` based on 
# the `ID` values contained in `mydf_auto1` and `mydf_auto2`
mydf_auto1 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02-29"), Fprice = c(NA,6543))
mydf_auto2 <- data.frame(ID=c(22319, 22319),DATE=c("2011-09-30","2011-10-31") , Fprice =c(8708,8708)) 
list2 <- list(mydf_auto1,mydf_auto2)
names(list2) <- names2

# merge the list of data frames together
merged.list.of.dfs <-
  mapply( FUN = function( i, j )
    merge( x = i
           , y = j
           , by = c( "ID", "DATE" )
           , all = TRUE )
    , list1
    , list2
    , SIMPLIFY = FALSE )

# view results
merged.list.of.dfs
# $df1
#       ID       DATE SCORE Fprice
# 3 115477 2012-01-31   677     NA
# 4 115477 2012-02-29   635   6543
# 
# $df2
#      ID       DATE SCORE Fprice
# 1 22319 2011-09-30   621   8708
# 2 22319 2011-10-31   630   8708

# end of script #

It would be easier for you to do a merge , then separately extract the IDs you want 您进行merge会更容易,然后分别提取所需的ID

names1 <- c("df1", "df2")
mydf1 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02-29"), SCORE =c(677,635)) 
mydf2 <- data.frame(ID=c(22319, 22319), DATE=c("2011-09-30","2011-10-31"), SCORE = c(621,630))
# Note the change to use of rbind instead of list
list1 <- rbind(mydf1, mydf2)

names2 <- c("df_auto1", "df_auto2")
mydf_auto1 <- data.frame(ID=c(22319, 22319),DATE=c("2011-09-30","2011-10-31") , Fprice =c(8708,8708)) 
mydf_auto2 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02-29"), Fprice = c(NA,6543))
list2 <- rbind(mydf_auto1,mydf_auto2)

df <- merge(list1, list2, by = c("ID", "DATE"))
df[df$ID == 115477,]
df[df$ID == 22319, ]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM