简体   繁体   English

在R中合并多个data.frame

[英]merge multiple data.frame by row in R

I would like to merge multiple data.frame in R using row.names , doing a full outer join. 我想merge多个data.frame使用R中row.names ,做了充分的外部联接。 For this I was hoping to do the following: 为此,我希望做到以下几点:

x = as.data.frame(t(data.frame(a=10, b=13, c=14)))
y = as.data.frame(t(data.frame(a=1, b=2)))
z = as.data.frame(t(data.frame(a=3, b=4, c=3, d=11)))
res = Reduce(function(a,b) merge(a,b,by="row.names",all=T), list(x,y,z))

Warning message:
In merge.data.frame(a, b, by = "row.names", all = T) :
  column name ‘Row.names’ is duplicated in the result
> res
  Row.names Row.names V1.x V1.y V1
    1         1         a   10    1 NA
    2         2         b   13    2 NA
    3         3         c   14   NA NA
    4         a      <NA>   NA   NA  3
    5         b      <NA>   NA   NA  4
    6         c      <NA>   NA   NA  3
    7         d      <NA>   NA   NA 11

What I was hoping to get would be: 我希望得到的将是:

    V1 V2 V3
  a 10 1  3
  b 13 2  4
  c 14 NA 3
  d NA NA 11

The following works (up to some final column renaming): 以下工作(最后一些列重命名):

res <- Reduce(function(a,b){
        ans <- merge(a,b,by="row.names",all=T)
        row.names(ans) <- ans[,"Row.names"]
        ans[,!names(ans) %in% "Row.names"]
        }, list(x,y,z))

Indeed: 确实:

> res
  V1.x V1.y V1
a   10    1  3
b   13    2  4
c   14   NA  3
d   NA   NA 11

What happens with a row join is that a column with the original rownames is added in the answer, which in turn does not contain row names: 行连接会发生什么情况是在答案中添加了具有原始rownames的列,而该列又不包含行名称:

> merge(x,y,by="row.names",all=T)
  Row.names V1.x V1.y
1         a   10    1
2         b   13    2
3         c   14   NA

This behavior is documented in ?merge (under Value) 此行为记录在?merge (在Value下)

If the matching involved row names, an extra character column called Row.names is added at the left, and in all cases the result has 'automatic' row names. 如果匹配涉及行名称,则在左侧添加一个名为Row.names的额外字符列,并且在所有情况下结果都具有“自动”行名称。

When Reduce tries to merge again, it doesn't find any match unless the names are cleaned up manually. Reduce再次尝试合并时,除非手动清除名称,否则它不会找到任何匹配项。

For continuity, this is not a clean solution but a workaround, I transform the list argument of 'Reduce' using sapply . 为了保持连续性,这不是一个干净的解决方案,而是一种解决方法,我使用sapply转换'Reduce'的list参数。

Reduce(function(a,b) merge(a,b,by=0,all=T),
                      sapply(list(x,y,z),rbind))[,-c(1,2)]
   x y.x y.y
1 10   1   3
2 13   2   4
3 14  NA   3
4 NA  NA  11
Warning message:
In merge.data.frame(a, b, by = 0, all = T) :
  column name ‘Row.names’ is duplicated in the result

For some reason I did not have much success with Reduce. 出于某种原因,我在Reduce上没有取得多大成功。 given a list of data.frames (df.lst) and a list of suffixes (suff.lst) to change the names of identical columns, this is my solution (it's loop, I know it's ugly for R standards, but it works): 给出一个data.frames(df.lst)列表和一个后缀列表(suff.lst)来改变相同列的名称,这是我的解决方案(它的循环,我知道它对于R标准来说很难看,但它有效) :

df.merg <- as.data.frame(df.lst[1])
colnames(df.merg)[-1] <- paste(colnames(df.merg)[-1],suff.lst[[1]],sep="")
for (i in 2:length(df.lst)) {
    df.i <- as.data.frame(df.lst[i])
    colnames(df.i)[-1] <- paste(colnames(df.i)[-1],suff.lst[[i]],sep="")
    df.merg <- merge(df.merg, df.i, by.x="",by.y="", all=T)
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM