R dplyr full_join - 沒有公共鍵，需要公共列混合在一起

Question

例如，我有這兩個數據框：

dates = c('2020-11-19', '2020-11-20', '2020-11-21')
df1 <- data.frame(dates, area = c('paris', 'london', 'newyork'), 
                  rating = c(10, 5, 6),
                  rating2 = c(5, 6, 7))

df2 <- data.frame(dates, area = c('budapest', 'moscow', 'valencia'), 
                  rating = c(1, 2, 1))

> df1
       dates    area rating rating2
1 2020-11-19   paris     10       5
2 2020-11-20  london      5       6
3 2020-11-21 newyork      6       7
> df2
       dates     area rating
1 2020-11-19 budapest      1
2 2020-11-20   moscow      2
3 2020-11-21 valencia      1

使用 dplyr 執行外連接時：

df <- df1 %>%
  full_join(df2, by = c('dates', 'area'))

結果是這樣的：

       dates     area rating.x rating2 rating.y
1 2020-11-19    paris       10       5       NA
2 2020-11-20   london        5       6       NA
3 2020-11-21  newyork        6       7       NA
4 2020-11-19 budapest       NA      NA        1
5 2020-11-20   moscow       NA      NA        2
6 2020-11-21 valencia       NA      NA        1

即來自兩個數據框的評級列沒有混合在一起，而是創建了兩個單獨的列。

我怎樣才能得到這樣的結果？

       dates     area rating   rating2 
1 2020-11-19    paris       10       5       
2 2020-11-20   london        5       6       
3 2020-11-21  newyork        6       7       
4 2020-11-19 budapest        1      NA        
5 2020-11-20   moscow        2      NA        
6 2020-11-21 valencia        1      NA

Answer 1

您正在尋找的是dplyr::bind_rows() ，它將保留公共列並為僅存在於其中一個數據框中的列填充NA ：

> bind_rows(df1, df2)
       dates     area rating rating2
1 2020-11-19    paris     10       5
2 2020-11-20   london      5       6
3 2020-11-21  newyork      6       7
4 2020-11-19 budapest      1      NA
5 2020-11-20   moscow      2      NA
6 2020-11-21 valencia      1      NA

請注意，您也可以繼續使用full_join() - 但如果您不希望列被拆分，則必須確保數據框之間的所有公共列都作為鍵包含：

> full_join(
+   df1, df2,
+   by = c("dates", "area", "rating")
+ )
       dates     area rating rating2
1 2020-11-19    paris     10       5
2 2020-11-20   london      5       6
3 2020-11-21  newyork      6       7
4 2020-11-19 budapest      1      NA
5 2020-11-20   moscow      2      NA
6 2020-11-21 valencia      1      NA

dplyr 的文檔加入提到：

Output 列包括所有x列和所有y列。 如果x和y中的列具有相同的名稱（並且不包含在by中），則添加后綴以消除歧義。

您也可以通過不指定by來避免此問題，在這種情況下 dplyr 將使用所有常用列。

> full_join(df1, df2)
Joining, by = c("dates", "area", "rating")
       dates     area rating rating2
1 2020-11-19    paris     10       5
2 2020-11-20   london      5       6
3 2020-11-21  newyork      6       7
4 2020-11-19 budapest      1      NA
5 2020-11-20   moscow      2      NA
6 2020-11-21 valencia      1      NA

據我所知，這兩種方法都適合您的用例。 事實上，我相信full_join()相對於bind_rows()的實際優勢正是您希望在此處避免的這種行為，即拆分不是鍵的列。

R dplyr full_join - 沒有公共鍵，需要公共列混合在一起

問題描述

1 個解決方案

解決方案1
4 已采納 2021-12-17 18:26:47

R dplyr full_join - 沒有公共鍵，需要公共列混合在一起

問題描述

1 個解決方案

解決方案1 4 已采納 2021-12-17 18:26:47

解決方案1
4 已采納 2021-12-17 18:26:47