简体   繁体   中英

Merge Two Lists of Data Frames by ID and DATE in R

I need to merge two lists of data frames by two key variables, ID and DATE. Here is an example of the data that I have:

 names1 <- c("df1", "df2")
 mydf1 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02-   29"), SCORE =c(677,635)) 
 mydf2 <- data.frame(ID=c(22319, 22319), DATE=c("2011-09-30","2011-10-31"), SCORE = c(621,630))
 list1 <- list(mydf1,mydf2)
 names(list1) <- names1

 names2 <- c("df_auto1", "df_auto2")
 mydf_auto1 <- data.frame(ID=c(22319, 22319),DATE=c("2011-09-30","2011-10-31") , Fprice =c(8708,8708)) 
 mydf_auto2 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02-29"), Fprice = c(NA,6543))
 list2 <- list(mydf_auto1,mydf_auto2)
 names(list2) <- names2

I tried to use Map function but the output I got is messed up. Here is what I tried to do:

 V <-Map(merge, list1, list2,MoreArgs=list(by=c('ID','DATE'), all=TRUE))

 for (i in seq_along(V)) {
 write.csv(V[[i]], paste0("merge_",i, ".csv"))
 }

As the final output, I'd like to get one dataframe with ID = 115477 and fully populated variables such as DATE, SCORE and Fprice; another dataframe with ID = 22319 and fully populated as well. For example, for ID = 115477 I'd like to get:

  ID        DATE          SCORE    Fprice
 115477    2012-01-31     677     NA
 115477    2012-02-29     635     6543 

Does anyone have any idea of what I am doing wrong? Thank you for your help.

Here is a tidyverse approach:

library(tidyverse);
list(bind_rows(list1), bind_rows(list2)) %>%
    reduce(function(x, y) full_join(x, y, by = c("ID", "DATE"))) %>%
    filter(ID %in% c(115477))
#      ID       DATE SCORE Fprice
#1 115477 2012-01-31   677     NA
#2 115477 2012-02-29   635   6543

Explanation: For each list we bind rows into a single data.frame ; we collect the two collapsed data.frame s in a list and then perform an outer join by "ID" and "DATE" ; we use dplyr::filter to pull out the rows of interest (here ID==115477 ).

Overview

Conduct the merge() inside of mapply() .

The end result is a list containing two data frames, each one the result of j th element in list2 being outer joined onto the i th element in list1 .

Note: There was a typo in the second DATE element within mydf1 that is corrected below. My answer depends on the contents of list1 and list2 possessing data frames that contain the same ID value, in the same order. As the OP has it arranged, mydf_auto2 is set to be merged onto mydf1 ; whereas mydf_auto2 should be merged onto mydf2 based on these two data frames sharing the same ID value. I revise the ordering within list2 to produce the desired output.

# create first list of data frames
names1 <- c("df1", "df2")
# note the extra spacing in "2012-02-29" has been corrected
mydf1 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02-29"), SCORE =c(677,635)) 
mydf2 <- data.frame(ID=c(22319, 22319), DATE=c("2011-09-30","2011-10-31"), SCORE = c(621,630))
list1 <- list(mydf1,mydf2)
names(list1) <- names1

# create second list of data frames
names2 <- c("df_auto1", "df_auto2")
# here is where I relabel the data frames
# to sync with `mydf1` and `mydf2` based on 
# the `ID` values contained in `mydf_auto1` and `mydf_auto2`
mydf_auto1 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02-29"), Fprice = c(NA,6543))
mydf_auto2 <- data.frame(ID=c(22319, 22319),DATE=c("2011-09-30","2011-10-31") , Fprice =c(8708,8708)) 
list2 <- list(mydf_auto1,mydf_auto2)
names(list2) <- names2

# merge the list of data frames together
merged.list.of.dfs <-
  mapply( FUN = function( i, j )
    merge( x = i
           , y = j
           , by = c( "ID", "DATE" )
           , all = TRUE )
    , list1
    , list2
    , SIMPLIFY = FALSE )

# view results
merged.list.of.dfs
# $df1
#       ID       DATE SCORE Fprice
# 3 115477 2012-01-31   677     NA
# 4 115477 2012-02-29   635   6543
# 
# $df2
#      ID       DATE SCORE Fprice
# 1 22319 2011-09-30   621   8708
# 2 22319 2011-10-31   630   8708

# end of script #

It would be easier for you to do a merge , then separately extract the IDs you want

names1 <- c("df1", "df2")
mydf1 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02-29"), SCORE =c(677,635)) 
mydf2 <- data.frame(ID=c(22319, 22319), DATE=c("2011-09-30","2011-10-31"), SCORE = c(621,630))
# Note the change to use of rbind instead of list
list1 <- rbind(mydf1, mydf2)

names2 <- c("df_auto1", "df_auto2")
mydf_auto1 <- data.frame(ID=c(22319, 22319),DATE=c("2011-09-30","2011-10-31") , Fprice =c(8708,8708)) 
mydf_auto2 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02-29"), Fprice = c(NA,6543))
list2 <- rbind(mydf_auto1,mydf_auto2)

df <- merge(list1, list2, by = c("ID", "DATE"))
df[df$ID == 115477,]
df[df$ID == 22319, ]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM