[英]R group repeating values
如果我正在處理這樣的數據集
Id Index Value
1233 i1 Blue
1233 i2 Blue
1233 i3 Blue
6545 i1 Red
6545 i2 NA
6545 i3 Black
4177 i1 NA
4177 i2 NA
4177 i2 NA
如何通過僅保留一個重復值的實例來創建新數據集,例如 1233 和 4177,如下所示。
Id Index Value
1233 i Blue
6545 i1 Red
6545 i2 NA
6545 i3 Black
4177 i NA
我們可以使用distinct
library(dplyr)
distinct(df1, Id, Value, .keep_all = TRUE)
# Id Index Value
#1 1233 i1 Blue
#2 6545 i1 Red
#3 6545 i2 <NA>
#4 6545 i3 Black
#5 4177 i1 <NA>
或使用base R
df1[!duplicated(df1[c('Id', 'Value')]),]
df1 <- structure(list(Id = c(1233L, 1233L, 1233L, 6545L, 6545L, 6545L,
4177L, 4177L, 4177L), Index = c("i1", "i2", "i3", "i1", "i2",
"i3", "i1", "i2", "i2"), Value = c("Blue", "Blue", "Blue", "Red",
NA, "Black", NA, NA, NA)), class = "data.frame", row.names = c(NA,
-9L))
也許unique
+ rownames
可以幫助您
df[as.numeric(rownames(unique(df[-2]))),]
以至於
Id Index Value
1 1233 i1 Blue
4 6545 i1 Red
5 6545 i2 <NA>
6 6545 i3 Black
7 4177 i1 <NA>
數據
df <- structure(list(Id = c(1233L, 1233L, 1233L, 6545L, 6545L, 6545L,
4177L, 4177L, 4177L), Index = c("i1", "i2", "i3", "i1", "i2",
"i3", "i1", "i2", "i2"), Value = c("Blue", "Blue", "Blue", "Red",
NA, "Black", NA, NA, NA)), class = "data.frame", row.names = c(NA,
-9L))
您可以使用data.table
包及其unique
方法的by
參數:
library(data.table)
unique(setDT(df), by = c("Id", "Value"))
# Id Index Value
# 1: 1233 i1 Blue
# 2: 6545 i1 Red
# 3: 6545 i2 <NA>
# 4: 6545 i3 Black
# 5: 4177 i1 <NA>
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.