[英]Get distinct rows from dataframe with specific condition
I have a dataframe:我有一个 dataframe:
ID Name Value
1 John 17
1 17
2 NULL
3 NULL
4 Mike 35
4 Mike NULL
5 Leo 22
5 Leo
I want there be only rows with unique ID.我希望只有具有唯一 ID 的行。 I need to keep those rows with maximally full columns.我需要保留这些行的最大完整列。 So, desired result is:所以,想要的结果是:
ID Name Value
1 John 17
2 NULL
3 NULL
4 Mike 35
5 Leo 22
As you see, all ID are kept, but now there are only hose with maximally full columns.如您所见,所有 ID 都保留了下来,但现在只有具有最大满列的软管。 How could i do that?我怎么能那样做?
I tried df[complete_cases(df),], but it removed fully empty rows (ID 2 and 3)我尝试了 df[complete_cases(df),],但它删除了完全空的行(ID 2 和 3)
I would use dplyr::distinct()
in a pipe like that:我会像这样在 pipe 中使用dplyr::distinct()
:
df %>% distinct(ID, .keep_all = TRUE)
(.keep_all = TRUE to keep other columns) df %>% distinct(ID, .keep_all = TRUE)
(.keep_all = TRUE 保留其他列)
or in base R或在基地 R
df[,duplicated(df$ID),]
both would do the job两者都会做的工作
Using aggregate
with option na.action=na.pass
.使用带有选项na.action=na.pass
的aggregate
。
aggregate(cbind(Value, Name) ~ ID, dat, el, na.action=na.pass)
# ID Value Name
# 1 1 17 John
# 2 2 NULL <NA>
# 3 3 NULL <NA>
# 4 4 35 Mike
# 5 5 22 Leo
Data:数据:
dat <- structure(list(ID = c(1L, 1L, 2L, 3L, 4L, 4L, 5L, 5L), Name = c("John",
NA, NA, NA, "Mike", "Mike", "Leo", "Leo"), Value = c("17", "17",
"NULL", "NULL", "35", NA, "22", "NULL")), class = "data.frame", row.names = c(NA,
-8L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.