從R數據框中的每一行中刪除重復項

Question

我有一個具有1209列和27900行的數據框。

對於每一行，重復的值分散在列周圍。 我嘗試過轉置數據框並按列刪除。 但是它崩潰了。

轉置后，我使用了：

for(i in 1:ncol(df)){

        #replicate column i without duplicates, fill blanks with NAs
        df <-  cbind.fill(df,unique(df[,1]), fill = NA)
        #rename the new column
        colnames(df)[n+1] <- colnames(df)[1]
        #delete the old column
        df[,1] <- NULL
}

但是到目前為止沒有結果。

我想知道是否有人有任何想法。

最好

Answer 1

據我了解，您想用NA替換每列中的重復值嗎？

這可以通過幾種方式來完成。

首先一些數據：

set.seed(7)
df <- data.frame(x = sample(1: 20, 50, replace = T),
                 y = sample(1: 20, 50, replace = T),
                 z = sample(1: 20, 50, replace = T))
head(df, 10)
#output
    x  y  z
1  20 12  8
2   8 15 10
3   3 16 10
4   2 13  8
5   5 15 13
6  16  8  7
7   7  4 20
8  20  4  1
9   4  8 16
10 10  6  5

與purrr庫：

library(purrr)
map_dfc(df, function(x) ifelse(duplicated(x), NA, x))
#output
# A tibble: 50 x 3
       x     y     z
   <int> <int> <int>
 1    20    12     8
 2     8    15    10
 3     3    16    NA
 4     2    13    NA
 5     5    NA    13
 6    16     8     7
 7     7     4    20
 8    NA    NA     1
 9     4    NA    16
10    10     6     5
# ... with 40 more rows

與適用於基數R

as.data.frame(apply(df, 2, function(x) ifelse(duplicated(x), NA, x)))

從R數據框中的每一行中刪除重復項

問題描述

1 個解決方案

解決方案1
0 2017-11-10 19:22:35

從R數據框中的每一行中刪除重復項

問題描述

1 個解決方案

解決方案1 0 2017-11-10 19:22:35

解決方案1
0 2017-11-10 19:22:35