[英]validating duplicates in data frame in R
I am trying show duplicate records in data frame like below but how i can ignore blank or NA because sometimes email column can be blank or NA我正在尝试在如下所示的数据框中显示重复记录,但我如何忽略空白或不适用,因为有时电子邮件列可以为空白或不适用
df4 <- data.frame(emp_id =c("DEV-2962","KTN_2252","ANA2719","ITI_2624","KTN_2252","HRT2921","","KTN2624","DEV2698","ITI2535","DEV2698","HRT2837","ERV2951","KTN2542","ANA2813","ITI2210"),
email = c("akash.dev@abcd.com","rahul.singh@abcd.com","salman.abbas@abcd.com","ram.lal@abcd.com","rahul.singh@abcd.com","prabal.garg@xyz.com","sanu.ali@abcd.com","salman.abbas@abcd.com","","",NA,NA,"giriraj.singh@dkl.com","lokesh.sharma@abcd.com","pooja.pawar@abcd.com","nikita.sharma@abcd.com"))
ID = "emp_id"
Email = "email"
df4 <- df4 %>%
mutate(across(c(ID, Email), ~as.integer(duplicated(.)), .names = 'flag_{col}'))
You can convert all blank values to NA
and use an incomparables
argument in your code:您可以将所有空白值转换为NA
并在代码中使用incomparables
参数:
df4 <- df4 %>%
mutate(across(everything(), ~ifelse(. == "", NA, .))) %>%
mutate(across(c(ID, Email), ~as.integer(duplicated(.,incomparables = NA)), .names = 'flag_{col}'))
You can also add arguments to incomparables
:您还可以向incomparables
添加参数:
df4 <- df4 %>%
mutate(across(c(ID, Email), ~as.integer(duplicated(.,incomparables = c("", NA))), .names = 'flag_{col}'))
Both options give us:这两种选择都为我们提供:
emp_id email flag_emp_id flag_email
1 DEV-2962 akash.dev@abcd.com 0 0
2 KTN_2252 rahul.singh@abcd.com 0 0
3 ANA2719 salman.abbas@abcd.com 0 0
4 ITI_2624 ram.lal@abcd.com 0 0
5 KTN_2252 rahul.singh@abcd.com 1 1
6 HRT2921 prabal.garg@xyz.com 0 0
7 <NA> sanu.ali@abcd.com 0 0
8 KTN2624 salman.abbas@abcd.com 0 1
9 DEV2698 <NA> 0 0
10 ITI2535 <NA> 0 0
11 DEV2698 <NA> 1 0
12 HRT2837 <NA> 0 0
13 ERV2951 giriraj.singh@dkl.com 0 0
14 KTN2542 lokesh.sharma@abcd.com 0 0
15 ANA2813 pooja.pawar@abcd.com 0 0
16 ITI2210 nikita.sharma@abcd.com 0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.