简体   繁体   English

R:两个数据帧合并

[英]R: two data frame merge

I want to merge two data frame, but there are some row names repeated. 我想合并两个数据框,但是重复了一些行名。 If the numbers of row name in two data frame are different, I hope the it can show "NA" in the less one column. 如果两个数据框中的行名编号不同,我希望它可以在较少的一列中显示“ NA”。

My example: 我的例子:

test1 <- data.frame(name = c("A", "B", "C", "C", "C", "D"), n1 = c("15", "14", "13", "12", "11", "10"))
test2 <- data.frame(name = c("A", "B", "B", "C", "C", "D"), n1 = c("30", "31", "33", "39", "38", "40")) 

Then I merge by name, I got 然后我按名称合并,我得到了

name n1.x n1.y 名称n1.x n1.y

 A 15 30 B 14 31 B 14 33 C 13 39 C 13 38 C 12 39 C 12 38 C 11 39 C 11 38 D 10 40 

It will repeating What I want to is 它会重复我想要的

name n1.x n1.y 名称n1.x n1.y

 A 15 30 B 14 31 B NA 33 C 13 39 C 12 38 C 11 NA D 10 40 

What Command should I use? 我应该使用什么命令? Thank you very much! 非常感谢你!

Try: 尝试:

test1$indx <- with(test1, ave(1:nrow(test1), name, FUN=seq_along))
test2$indx <- with(test2, ave(1:nrow(test2), name, FUN=seq_along))
merge(test1, test2, by=c("name","indx"),all=T)[,-2]
 #   name n1.x n1.y
# 1    A   15   30
# 2    B   14   31
# 3    B <NA>   33
# 4    C   13   39
# 5    C   12   38
# 6    C   11 <NA>
# 7    D   10   40

I will post this before data.table.people come in with a slick, scalable and quicl solution. 我将在data.table.people一个光滑,可扩展且快速的解决方案之前发布此消息。

Be warned, that this works for provided data set. 请注意,这适用于提供的数据集。 You should examine the results of your production code carefully. 您应该仔细检查生产代码的结果。

What the below code does is sticks together values for a common level. 以下代码的作用是将公共级别的值粘贴在一起。 The rest is just bookkeeping. 剩下的只是簿记。

ml <- vector("list", length(unique(test1$name)))
names(ml) <- unique(test1$name)

for (i in unique(test1$name)) {
  o1 <- test1[test1$name %in% i, , drop = FALSE]
  o2 <- test2[test2$name %in% i, , drop = FALSE]
  o.max <- max(c(nrow(o1), nrow(o2)))
  nc <- ifelse(o.max == 1, 2, o.max*2)
  out <- matrix(rep(NA, times = nc), nrow = nc/2)
  out[1:nrow(o1), 1] <- as.numeric(as.character(o1$n1))
  out[1:nrow(o2), 2] <- as.numeric(as.character(o2$n1))

  ml[[i]] <- out
}

count.each <- sapply(ml, nrow)
result <- do.call("rbind", ml)
colnames(result) <- c("n1.x", "n1.y")
data.frame(name = rep(names(ml), count.each), result)

  name n1.x n1.y
1    A   15   30
2    B   14   31
3    B   NA   33
4    C   13   39
5    C   12   38
6    C   11   NA
7    D   10   40

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM