简体   繁体   English

删除重复项以确保NA值R

[英]Remove duplicates making sure of NA values R

My data set(df) looks like, 我的数据集(df)看起来像

   ID    Name    Rating    Score  Ranking
   1     abc       3        NA      NA
   1     abc       3        12      13
   2     bcd       4        NA      NA
   2     bcd       4        19      20

I'm trying to remove duplicates which using 我正在尝试删除重复使用

   df <- df[!duplicated(df[1:2]),]

which gives, 这使,

   ID    Name    Rating    Score  Ranking
   1     abc       3        NA      NA
   2     bcd       4        NA      NA

but I'm trying to get, 但我想得到

   ID    Name    Rating    Score  Ranking
   1     abc       3        12      13
   2     bcd       4        19      20

How do I avoid rows containing NA's when removing duplicates at the same time, some help would be great, thanks. 当同时删除重复项时,如何避免包含NA的行,因此有些帮助将非常有用,谢谢。

First, push the NAs to last with na.last = T 首先,用na.last = T将NA推到最后

df<-df[with(df, order(ID, Name, Score, Ranking),na.last = T),]

then do the removing of duplicated ones with fromLast = FALSE argument: 然后使用fromLast = FALSE参数删除重复的fromLast = FALSE

df <- df[!duplicated(df[1:2],fromLast = FALSE),]

使用dplyr

df <- df %>% filter(!duplicated(.[,1:2], fromLast = T))

You could just filter out the observations you don't want with which() and then use the unique() function: 您可以使用which()过滤掉不需要的观察值,然后使用unique()函数:

a<-unique(c(which(df[,'Score']!="NA"), which(df[,'Ranking']!="NA")))

df2<-unique(df[a,])

> df2
  ID Name Rating Score Ranking
2  1  abc      3    12      13
4  2  bcd      4    19      20

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM