简体   繁体   English

合并R中的数据集

[英]Merging data sets in R

I am facing a trivial problem while merging 2 data.frames in R. 我在R中合并2个data.frames时遇到了一个小问题。

I am trying to merge 2 data.frames that have same column names and I would want R to merge the same name columns as one column instead of making it 2 separate columns. 我正在尝试合并具有相同列名的2个data.frames,我希望R将相同名称列合并为一列,而不是将其作为2个单独的列。

Typically what happens when R encounters same name columns while merging data.frames is that it creates 2 seperate variables with suffix "x" and "y". 通常,当R在合并data.frames时遇到相同名称列时会发生什么,它会创建2个带有后缀“x”和“y”的单独变量。 Is there a way to specify this in the merge command to treat similar name columns in the different datasets as one column/variable? 有没有办法在merge命令中指定这个来将不同数据集中的类似名称列视为一个列/变量?

The code that we could use as an example: 我们可以用作例子的代码:

x = data.frame(id = c("a","c","d","g"), 
              maths = c(1,3,4,7),  physics = c(1,3,4,7),  chemistry = c(1,3,4,7),  
           english = c(1,3,4,7))
y = data.frame(id = c("b","c","d","e","f"),
                maths = c(5,6,8,9,7), physics = c(5,6,8,9,7), chemistry = c(5,6,8,9,7),
           english = c(5,6,8,9,7))

xy <- merge(x, y, by = "id")

Now there is a workaround for the same where we can create a new variable in the merged data set that takes the non NA values from the same name columns, but this is very inefficient if you have large number of columns. 现在有一个相同的解决方法,我们可以在合并数据集中创建一个新变量,该变量从同名列中获取非NA值,但如果您有大量列,则效率非常低。

SAS users would relate to this problem as this problem was brought to my notice by a pro SAS user, where the merge() statement combines 2 same name columns into one column. SAS用户会遇到这个问题,因为这个问题是由一个专业的SAS用户注意到的,其中merge()语句将两个相同的名称列组合成一个列。

Also, as one of the answers below mentioned, if we use: 另外,作为下面提到的答案之一,如果我们使用:

xy <- merge(x, y, by = intersect(names(x), names(y)))

We get no intersection between the 2 data.frames. 我们在2个data.frames之间没有交集。 Ideally we would want there to be 4 observations here, 2 for each observation in the 2 data.frames id = c("c","d") 理想情况下,我们希望这里有4个观测值,2个数据中每个观测值2个。帧数id = c(“c”,“d”)

Would be grateful to any pro R users to help me out on this one. 非常感谢任何专业R用户帮我解决这个问题。

Thanks! 谢谢!

Do you really want to merge or is rbind(x,y) what you are looking for? 你真的想合并还是rbind(x,y)你在寻找什么? In your example this results in the same data.frame (after sorting by id ). 在您的示例中,这会产生相同的data.frame (按id排序后)。 If you want to actually merge the data.frames , you have to specify the names that you do not want to duplicate: 如果要实际合并data.frames ,则必须指定不想复制的名称:

merge(x, y, all=TRUE)
merge(x, y, by = c("id", "maths", "physics", "chemistry", "english"), all = TRUE)

here is my quick solution. 这是我的快速解决方案。

Hope it helps. 希望能帮助到你。 Note that the first column of x is my id to join 请注意,x的第一列是我要加入的ID

output <- merge(x[!(names(x)[2:length(names(x))] %in% names(y))], y, by.x = "id", by.y="id", all=TRUE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM