[英]Remove duplicates making sure of NA values R
My data set(df) looks like, 我的数据集(df)看起来像
ID Name Rating Score Ranking
1 abc 3 NA NA
1 abc 3 12 13
2 bcd 4 NA NA
2 bcd 4 19 20
I'm trying to remove duplicates which using 我正在尝试删除重复使用
df <- df[!duplicated(df[1:2]),]
which gives, 这使,
ID Name Rating Score Ranking
1 abc 3 NA NA
2 bcd 4 NA NA
but I'm trying to get, 但我想得到
ID Name Rating Score Ranking
1 abc 3 12 13
2 bcd 4 19 20
How do I avoid rows containing NA's when removing duplicates at the same time, some help would be great, thanks. 当同时删除重复项时,如何避免包含NA的行,因此有些帮助将非常有用,谢谢。
First, push the NAs to last with na.last = T
首先,用
na.last = T
将NA推到最后
df<-df[with(df, order(ID, Name, Score, Ranking),na.last = T),]
then do the removing of duplicated ones with fromLast = FALSE
argument: 然后使用
fromLast = FALSE
参数删除重复的fromLast = FALSE
:
df <- df[!duplicated(df[1:2],fromLast = FALSE),]
使用dplyr
df <- df %>% filter(!duplicated(.[,1:2], fromLast = T))
You could just filter out the observations you don't want with which() and then use the unique() function: 您可以使用which()过滤掉不需要的观察值,然后使用unique()函数:
a<-unique(c(which(df[,'Score']!="NA"), which(df[,'Ranking']!="NA")))
df2<-unique(df[a,])
> df2
ID Name Rating Score Ranking
2 1 abc 3 12 13
4 2 bcd 4 19 20
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.