在R中將同一表的3個版本組合在一起

Question

我從某個網站上抓取了一些數據，但它確實很簡陋，出於某種原因，它幾乎沒有錯誤。 因此，我將相同的數據抓取了3次，並生成了3個看起來像這樣的表：

library(data.table)
df1 <- data.table(name = c('adam', 'bob', 'carl', 'dan'),
                  id = c(1, 2, 3, 4),
                  thing=c(2, 1, 3, 4),
                  otherthing = c(2,1, 3, 4)
                  )

df2 <- data.table(name = c('adam', 'bob', 'carl', 'dan'),
                  id = c(1, 2, 3, 4),
                  thing=c(1, 1, 1, 4),
                  otherthing = c(2,2, 3, 4)
)

df3 <- data.table(name = c('adam', 'bob', 'carl', 'dan'),
                  id = c(1, 2, 3, 4),
                  thing=c(1, 1, 3, 4),
                  otherthing = c(2,1, 3, 3)
)

除了我還有更多專欄。 我想將3個表組合在一起，並且當“事物”和“其他事物”等的值發生沖突時，我希望它選擇至少具有2/3的值，並且如果存在則返回N / A沒有2/3的值。 我相信“名稱”和“ id”字段很好，它們是我想要合並的內容。

我正在考慮將表的名稱分別設置為3個表中的“ thing1”，“ thing2”和“ thing3”，合並在一起，然后通過這些名稱編寫一些循環。 有沒有更優雅的解決方案？ 它需要為300多個值列工作，盡管我並不擔心速度。

在此示例中，我認為解決方案應該是：

final_result <- data.table(name = c('adam', 'bob', 'carl', 'dan'),
                  id = c(1, 2, 3, 4),
                  thing=c(1, 1, 3, 4),
                  otherthing = c(2,1, 3, 4)
)

Answer 1

要概括來自@IceCreamToucan的方法，我們可以使用：

library(dplyr)

n_mode <- function(...) {
  x <- table(c(...))
  if(any(x > 1)) as.numeric(names(x)[which.max(x)])
  else NA
}

bind_rows(df1, df2, df3) %>%
  group_by(name, id) %>%
  summarise_all(funs(n_mode(.)))

注意注意您的命名空間以及如何命名該函數... n_mode()類的名稱，以避免與base::mode沖突。 最后，如果將其擴展到更多data.frames，則可能需要將它們放在列表中。 如果那不可能/不切實際，則可以用purrr::map_df(ls(pattern = "^df[[:digit:]]+"), get)替換bind_rows

Answer 2

Jason解決方案的數據表版本（您應該將其保留為接受狀態）

library(data.table)
n_mode <- function(x) {
  x <- table(x)
  if(any(x > 1)) as.numeric(names(x)[which.max(x)])
  else NA
}

my_list <- list(df1, df2, df3)

rbindlist(my_list)[, lapply(.SD, n_mode), .(name, id)]

#    name id thing otherthing
# 1: adam  1     1          2
# 2:  bob  2     1          1
# 3: carl  3     3          3
# 4:  dan  4     4          4

這是rbindlist的輸出。 希望這可以更清楚地說明為什么僅采用所有列的n_mode （按name和id分組）就可以提供所需的輸出。

rbindlist(my_list)[order(name, id)]

#     name id thing otherthing
#  1: adam  1     2          2
#  2: adam  1     1          2
#  3: adam  1     1          2
#  4:  bob  2     1          1
#  5:  bob  2     1          2
#  6:  bob  2     1          1
#  7: carl  3     3          3
#  8: carl  3     1          3
#  9: carl  3     3          3
# 10:  dan  4     4          4
# 11:  dan  4     4          4
# 12:  dan  4     4          3

在R中將同一表的3個版本組合在一起

問題描述

2 個解決方案

解決方案1
3 已采納 2018-12-06 21:06:21

解決方案2
1 2018-12-06 20:58:00

在R中將同一表的3個版本組合在一起

問題描述

2 個解決方案

解決方案1 3 已采納 2018-12-06 21:06:21

解決方案2 1 2018-12-06 20:58:00

解決方案1
3 已采納 2018-12-06 21:06:21

解決方案2
1 2018-12-06 20:58:00