简体   繁体   English

具有超过 2 个 data.frames 后缀的嵌套 full_join

[英]Nested full_join with suffixes for more than 2 data.frames

I want to merge several data.frames with some common columns and append a suffix to the column names to keep track from where does the data for each column come from.我想将几个 data.frames 与一些常见的列合并,append 是列名的后缀,以跟踪每列的数据来自哪里。

I can do it easily with the suffix term in the first full_join, but when I do the second join, no suffixes are added.我可以使用第一个 full_join 中的后缀术语轻松完成,但是当我进行第二个 join 时,不会添加任何后缀。 I can rename the third data.frame so it has suffixes, but I wanted to know if there is another way of doing it using the suffix term.我可以重命名第三个 data.frame,使其具有后缀,但我想知道是否有另一种使用后缀术语的方法。

Here is an example code:这是一个示例代码:

x = data.frame(col1 = c("a","b","c"), col2 = 1:3, col3 = 1:3)
y = data.frame(col1 = c("b","c","d"), col2 = 4:6, col3 = 1:3)
z = data.frame(col1 = c("c","d","a"), col2 = 7:9, col3 = 1:3)

> df = full_join(x, y, by = "col1", suffix = c("_x","_y")) %>% 
  full_join(z, by = "col1", suffix = c("","_z")) 

> df
  col1 col2_x col3_x col2_y col3_y col2 col3
1    a      1      1     NA     NA    9    3
2    b      2      2      4      1   NA   NA
3    c      3      3      5      2    7    1
4    d     NA     NA      6      3    8    2

I was expecting that col2 and col3 from data.frame z would have a "_z" suffix.我期待 data.frame z中的col2col3会有一个“_z”后缀。 I have tried using empty suffixes while merging two data.frames and it works.我尝试在合并两个 data.frame 时使用空后缀,它可以工作。

I can work around by renaming the columns in z before doing the second full_join, but in my real data I have several common columns, and if I wanted to merge more data.frames it would complicate the code.我可以通过在执行第二个 full_join 之前重命名z中的列来解决问题,但是在我的真实数据中,我有几个公共列,如果我想合并更多的 data.frames,它会使代码复杂化。 This is my expected output.这是我预期的 output。

> colnames(z) = paste0(colnames(z),"_z")

> df = full_join(x, y, by = "col1", suffix = c("_x","_y")) %>% 
  full_join(z, by = c("col1"="col1_z"))

> df
  col1 col2_x col3_x col2_y col3_y col2_z col3_z
1    a      1      1     NA     NA      9      3
2    b      2      2      4      1     NA     NA
3    c      3      3      5      2      7      1
4    d     NA     NA      6      3      8      2

I have seen other similar problems in which adding an extra column to keep track of the source data.frame is used, but I was wondering why does not the suffix term work with multiple joins.我已经看到其他类似的问题,其中使用了添加额外的列来跟踪源 data.frame,但我想知道为什么后缀术语不适用于多个连接。

PS: If I keep the first suffix empty, I can add suffixes in the second join, but that will leave the col2 and col3 form x without suffix. PS:如果我保留第一个后缀为空,我可以在第二个连接中添加后缀,但这会使 col2 和 col3 形式的 x 没有后缀。

> df = full_join(x, y, by = "col1", suffix = c("","_y")) %>% 
  full_join(z, by = "col1", suffix = c("","_z"))

> df
  col1 col2 col3 col2_y col3_y col2_z col3_z
1    a    1    1     NA     NA      9      3
2    b    2    2      4      1     NA     NA
3    c    3    3      5      2      7      1
4    d   NA   NA      6      3      8      2

You can do it like this:你可以这样做:

full_join(x, y, by = "col1", suffix = c("","_y")) %>% 
  full_join(z, by = "col1", suffix = c("_x","_z"))

  col1 col2_x col3_x col2_y col3_y col2_z col3_z
1    a      1      1     NA     NA      9      3
2    b      2      2      4      1     NA     NA
3    c      3      3      5      2      7      1
4    d     NA     NA      6      3      8      2

Adding the suffix for x at the last join should do the trick.在最后一个连接处添加xsuffix应该可以解决问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM