简体   繁体   English

从具有特定条件的 dataframe 获取不同的行

[英]Get distinct rows from dataframe with specific condition

I have a dataframe:我有一个 dataframe:

ID    Name    Value 
1     John    17
1             17
2             NULL
3             NULL
4    Mike     35
4    Mike     NULL  
5    Leo      22
5    Leo      

I want there be only rows with unique ID.我希望只有具有唯一 ID 的行。 I need to keep those rows with maximally full columns.我需要保留这些行的最大完整列。 So, desired result is:所以,想要的结果是:

ID    Name    Value 
1     John    17
2             NULL
3             NULL
4    Mike     35
5    Leo      22

As you see, all ID are kept, but now there are only hose with maximally full columns.如您所见,所有 ID 都保留了下来,但现在只有具有最大满列的软管。 How could i do that?我怎么能那样做?

I tried df[complete_cases(df),], but it removed fully empty rows (ID 2 and 3)我尝试了 df[complete_cases(df),],但它删除了完全空的行(ID 2 和 3)

I would use dplyr::distinct() in a pipe like that:我会像这样在 pipe 中使用dplyr::distinct()

df %>% distinct(ID, .keep_all = TRUE) (.keep_all = TRUE to keep other columns) df %>% distinct(ID, .keep_all = TRUE) (.keep_all = TRUE 保留其他列)

or in base R或在基地 R

df[,duplicated(df$ID),]

both would do the job两者都会做的工作

Using aggregate with option na.action=na.pass .使用带有选项na.action=na.passaggregate

aggregate(cbind(Value, Name) ~ ID, dat, el, na.action=na.pass)
#   ID Value Name
# 1  1    17 John
# 2  2  NULL <NA>
# 3  3  NULL <NA>
# 4  4    35 Mike
# 5  5    22  Leo

Data:数据:

dat <- structure(list(ID = c(1L, 1L, 2L, 3L, 4L, 4L, 5L, 5L), Name = c("John", 
NA, NA, NA, "Mike", "Mike", "Leo", "Leo"), Value = c("17", "17", 
"NULL", "NULL", "35", NA, "22", "NULL")), class = "data.frame", row.names = c(NA, 
-8L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM