[英]Remove duplicates from ONE column not row
I am trying to remove duplicate emails in a column of my data.frame using duplicate() and distinct() in R however, I do not need it to delete the whole row just the duplicate email addresses in that column.我正在尝试使用 R 中的 duplicate() 和 distinct() 删除我的 data.frame 列中的重复电子邮件,但是,我不需要它删除整行,只删除该列中的重复电子邮件地址。 Is there anyway to do that using these?
有没有办法使用这些来做到这一点? Or is there another way to do this?
或者有另一种方法可以做到这一点?
library(tidyverse)
patient2 <- c('John Doe','Peter Gynn','Jolie Hope', "Mycroft Holmes", "Carrie
Bird", "Carrie Bird", "Marcus Quimby", "Jennifer Poe", "Donna Moon")
salary2 <- c(21000, 23400, 26800, 40000, 50000, 33000, 24000, 75000, 90000)
email2 <- c("doe@gmail.com", "gynn@gmail.com", "hope@gmail.com",
"holmes@gmail.com", "bird@gmail.com", "bird@gmail.com", "quimby@gmail.com",
"poe@gmail.com", "moon@gmail.com")
startdate2 <- as.Date(c('2010-11-1','2008-3-25','2007-3-14', '2020-7-19',
'2019-4-20', '2018-2-13', '2017-4-21', '2019-6-10', '2010-9-19'))
patient.data_2 <- data.frame(patient2, salary2, email2, startdate2)
print(patient.data_2)
patient2<fctr> salary2<dbl> email2<fctr> startdate2<date>
John Doe 21000 doe@gmail.com 2010-11-01
Peter Gynn 23400 gynn@gmail.com 2008-03-25
Jolie Hope 26800 hope@gmail.com 2007-03-14
Mycroft Holmes 40000 holmes@gmail.com 2020-07-19
Carrie Bird 50000 bird@gmail.com 2019-04-20
Carrie Bird 33000 bird@gmail.com 2018-02-13
Marcus Quimby 24000 quimby@gmail.com 2017-04-21
Jennifer Poe 75000 poe@gmail.com 2019-06-10
Donna Moon 90000 moon@gmail.com 2010-09-19
extracted <- merged_data[!duplicated(merged_data$email), ]
extracted
All I would like to do is remove the extra duplicate email for the person Carrie Bird.
我想要做的就是删除 Carrie Bird 人的额外重复电子邮件。 Not the entire row because the date is different.
不是整行,因为日期不同。 I tried using duplicated() and distinct() and both removed the entire row.
我尝试使用duplicated() 和distinct() 并且都删除了整行。
You could use the duplicated
function:您可以使用
duplicated
功能:
dat <- data.frame(a = c(1, 1, 2, 2, 3, 3, 4, 4, 4, 4))
dat$a[duplicated(dat$a)] <- NA
dat
#> a
#> 1 1
#> 2 NA
#> 3 2
#> 4 NA
#> 5 3
#> 6 NA
#> 7 4
#> 8 NA
#> 9 NA
#> 10 NA
Using dplyr
使用
dplyr
library(dplyr)
dat <- dat %>%
mutate(a = replace(a, duplicated(a), NA))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.