简体   繁体   English

从一列而不是行中删除重复项

[英]Remove duplicates from ONE column not row

I am trying to remove duplicate emails in a column of my data.frame using duplicate() and distinct() in R however, I do not need it to delete the whole row just the duplicate email addresses in that column.我正在尝试使用 R 中的 duplicate() 和 distinct() 删除我的 data.frame 列中的重复电子邮件,但是,我不需要它删除整行,只删除该列中的重复电子邮件地址。 Is there anyway to do that using these?有没有办法使用这些来做到这一点? Or is there another way to do this?或者有另一种方法可以做到这一点?

library(tidyverse)
patient2 <- c('John Doe','Peter Gynn','Jolie Hope', "Mycroft Holmes", "Carrie 
Bird", "Carrie Bird", "Marcus Quimby", "Jennifer Poe", "Donna Moon")
salary2 <- c(21000, 23400, 26800, 40000, 50000, 33000, 24000, 75000, 90000)
email2 <- c("doe@gmail.com", "gynn@gmail.com", "hope@gmail.com", 
"holmes@gmail.com", "bird@gmail.com", "bird@gmail.com", "quimby@gmail.com", 
"poe@gmail.com", "moon@gmail.com")
startdate2 <- as.Date(c('2010-11-1','2008-3-25','2007-3-14', '2020-7-19', 
'2019-4-20', '2018-2-13', '2017-4-21', '2019-6-10', '2010-9-19'))

patient.data_2 <- data.frame(patient2, salary2, email2, startdate2)
print(patient.data_2)


patient2<fctr> salary2<dbl> email2<fctr> startdate2<date>
John Doe       21000    doe@gmail.com       2010-11-01  
Peter Gynn     23400    gynn@gmail.com      2008-03-25  
Jolie Hope     26800    hope@gmail.com      2007-03-14  
Mycroft Holmes 40000    holmes@gmail.com    2020-07-19  
Carrie Bird    50000    bird@gmail.com      2019-04-20  
Carrie Bird    33000    bird@gmail.com      2018-02-13  
Marcus Quimby  24000    quimby@gmail.com    2017-04-21  
Jennifer Poe   75000    poe@gmail.com       2019-06-10  
Donna Moon     90000    moon@gmail.com      2010-09-19    

extracted <- merged_data[!duplicated(merged_data$email), ]
extracted    

All I would like to do is remove the extra duplicate email for the person Carrie Bird.我想要做的就是删除 Carrie Bird 人的额外重复电子邮件。 Not the entire row because the date is different.不是整行,因为日期不同。 I tried using duplicated() and distinct() and both removed the entire row.我尝试使用duplicated() 和distinct() 并且都删除了整行。

You could use the duplicated function:您可以使用duplicated功能:

dat <- data.frame(a = c(1, 1, 2, 2, 3, 3, 4, 4, 4, 4))
dat$a[duplicated(dat$a)] <- NA
dat
#>     a
#> 1   1
#> 2  NA
#> 3   2
#> 4  NA
#> 5   3
#> 6  NA
#> 7   4
#> 8  NA
#> 9  NA
#> 10 NA

Using dplyr使用dplyr

library(dplyr)
dat <- dat %>% 
      mutate(a = replace(a, duplicated(a), NA))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM