合並R中的復制列

Question

我有一個數據框是這樣的：

   c1 c2 c3 c4
 r1 1  0  1  1
 r2 0  0  1  1
 r3 0  1  0  0

在這種情況下，c3和c4完全相同。 我想刪除重復的列，但保留c3和c4的列名，以獲取數據框：

第三列名稱與相同列的列名稱結合在一起。

我覺得應該有一種我無法想到的優雅方法。 任何幫助將不勝感激！

編輯：只是為了澄清，我的實際數據幀實際上是1000行x 1000列，我不知道哪些列是相同的。 因此，我需要一種自動的方式來測試列是否相同以及組合列名稱的情況。

Answer 1

額外的信息會增加有趣的皺紋！ 如果您不希望串聯列名，可以執行以下操作：

df <- data.frame(c1 = c(1,0,0), c2 = c(0,0,1), c3 = c(1,1,0), c4 = c(1,1,0), c5 = c(1,1,1), c6= c(1,1,1), c7 = c(2,2,2))

library(digest)
df_clean <- df[!duplicated(lapply(df, digest))]

在這一點上，df_clean將包含沒有重復的數據幀。

如果列名確實很重要，這是我在查看答案的答案后將如何處理的：

df_dups <- df[duplicated(lapply(df, digest))] #extract the duplicates

for (clean_col in 1:ncol(df_clean)){
  for (dup_col in 1:ncol(df_dups)){
    if (identical(df_clean[,clean_col], df_dups[,dup_col]) == TRUE){
      colnames(df_clean)[clean_col] <- paste(colnames(df_clean)[clean_col], colnames(df_dups)[dup_col], sep = "")
    }
  }
}

添加了用於測試的其他重復項的輸出看起來像這樣：

'data.frame':   3 obs. of  5 variables:
 $ c1  : num  1 0 0
 $ c2  : num  0 0 1
 $ c3c4: num  1 1 0
 $ c5c6: num  1 1 1
 $ c7  : num  2 2 2

Answer 2

這可能不是一個超級優雅的解決方案，但可以完成工作。 如果df是您的數據幀：

dups <- duplicated(lapply(df, function(x) x))
df_clean <- df[!dups]
df_dups <- df[dups]


for(z in 1: ncol(df_clean)){
  i <- names(df_clean)[z]
  df_clean[i] -> q
  d <- which(
      sapply(df_dups, function(x) {
      ifelse(identical(x,as.vector(sapply(q, function(x) x))), T, F) 
          })
      ) 
  names(df_clean)[z] <- paste0(i, paste(names(df_dups)[d], collapse = ""))
}

輸出為：

df_clean
   c1 c2 c3c4
r1  1  0    1
r2  0  0    1
r3  0  1    0

如果列可以有多個重復項，這也應該起作用。

合並R中的復制列

問題描述

2 個解決方案

解決方案1
2 已采納 2016-10-19 18:23:50

在這一點上，df_clean將包含沒有重復的數據幀。

解決方案2
1 2016-10-19 16:41:23

合並R中的復制列

問題描述

2 個解決方案

解決方案1 2 已采納 2016-10-19 18:23:50

在這一點上，df_clean將包含沒有重復的數據幀。

解決方案2 1 2016-10-19 16:41:23

解決方案1
2 已采納 2016-10-19 18:23:50

解決方案2
1 2016-10-19 16:41:23