简体   繁体   中英

Merging two lists of dataframes using R

I would like to merge two lists of dataframes according to a common id variable, consider the following example

set.seed(1)
mylist1=data.frame(id=sample(paste0("id",sample(1:5,10,T))),var1=sample(letters[1:26],10,T),stringsAsFactors=F);mylist1=split(mylist1,mylist1$id)
set.seed(2)
mylist2=data.frame(id=sample(paste0("id",sample(1:5,10,T))),var2=sample(LETTERS[1:26],10,T),stringsAsFactors=F);mylist2=split(mylist2,mylist2$id)

mylist1
# $id1
# id     var1
# id1    d
# 
# $id2
# id     var1
# id2    f
# id2    g
# id2    w
# etc.

mylist2
# $id1
# id     var2
# id1    V
# id1    D
# id1    J
# 
# $id3
# id     var2
# id3    K
# id3    J
# id3    Z
# etc.

The resulting list of dataframes should look like

# $id1
# id  var1 var2
# id1 d    V
# id1 d    D
# id1 d    J

# $id2
# id  var1 var2
# id2 f    NA
# id2 g    NA
# id2 w    NA
# etc.

Do yo know how I could do this?

We can use Map to do this. From the example dataset, it is clear that only some list elements are common to both (based on the names of the list elements).

Our first step would be to get all the unique names in each of the list using union . We subset the first ('lst1') and second list ('lst2') with those names ('nm1'). If there is a missing element, it will be a NULL element for that position.

nm1 <- union(names(mylist1), names(mylist2))
lst1 <- mylist1[nm1]
lst2 <- mylist2[nm1]

Now, we change the NULL values in each list by creating a 'data.frame' for that position. We can use if/else to do this on a lapply loop.

lst1 <- lapply(lst1, function(x) if(is.null(x)) 
                         data.frame(id=NA, var1=NA) else x)
lst2 <- lapply(lst2, function(x) if(is.null(x))
                        data.frame(id=NA, var2=NA) else x)

After that, we can merge the two lists using Map . The corresponding elements of the lists are merge d. Instead of using anonymous function, we can make use of MoreArgs to specify the extra arguments that may be needed for the merge .

Map(merge, lst1, lst2,MoreArgs=list(by='id', all=TRUE))
#$id1
#   id var1 var2
#1 id1    d    V
#2 id1    d    D
#3 id1    d    J

#$id2
#    id var1 var2
#1  id2    f   NA
#2  id2    g   NA
#3  id2    w   NA
#4 <NA> <NA>   NA

#$id3
#   id var1 var2
#1 id3    y    K
#2 id3    y    J
#3 id3    y    Z

#$id4
#   id var1 var2
#1 id4    a    D
#2 id4    i    D

#$id5
#   id var1 var2
#1 id5    q    R
#2 id5    q    M
#3 id5    q    D
#4 id5    k    R
#5 id5    k    M
#6 id5    k    D
#7 id5    j    R
#8 id5    j    M
#9 id5    j    D

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM