简体   繁体   English

验证 R 中数据框中的重复项

[英]validating duplicates in data frame in R

I am trying show duplicate records in data frame like below but how i can ignore blank or NA because sometimes email column can be blank or NA我正在尝试在如下所示的数据框中显示重复记录,但我如何忽略空白或不适用,因为有时电子邮件列可以为空白或不适用


df4 <- data.frame(emp_id =c("DEV-2962","KTN_2252","ANA2719","ITI_2624","KTN_2252","HRT2921","","KTN2624","DEV2698","ITI2535","DEV2698","HRT2837","ERV2951","KTN2542","ANA2813","ITI2210"),
                  email = c("akash.dev@abcd.com","rahul.singh@abcd.com","salman.abbas@abcd.com","ram.lal@abcd.com","rahul.singh@abcd.com","prabal.garg@xyz.com","sanu.ali@abcd.com","salman.abbas@abcd.com","","",NA,NA,"giriraj.singh@dkl.com","lokesh.sharma@abcd.com","pooja.pawar@abcd.com","nikita.sharma@abcd.com"))


ID = "emp_id"
Email = "email"

df4 <- df4 %>% 
  mutate(across(c(ID, Email), ~as.integer(duplicated(.)), .names = 'flag_{col}'))


You can convert all blank values to NA and use an incomparables argument in your code:您可以将所有空白值转换为NA并在代码中使用incomparables参数:

df4 <- df4 %>% 
  mutate(across(everything(), ~ifelse(. == "", NA, .))) %>% 
  mutate(across(c(ID, Email), ~as.integer(duplicated(.,incomparables = NA)), .names = 'flag_{col}'))

You can also add arguments to incomparables :您还可以向incomparables添加参数:

df4 <- df4 %>% 
  mutate(across(c(ID, Email), ~as.integer(duplicated(.,incomparables = c("", NA))), .names = 'flag_{col}'))

Both options give us:这两种选择都为我们提供:

     emp_id                  email flag_emp_id flag_email
1  DEV-2962     akash.dev@abcd.com           0          0
2  KTN_2252   rahul.singh@abcd.com           0          0
3   ANA2719  salman.abbas@abcd.com           0          0
4  ITI_2624       ram.lal@abcd.com           0          0
5  KTN_2252   rahul.singh@abcd.com           1          1
6   HRT2921    prabal.garg@xyz.com           0          0
7      <NA>      sanu.ali@abcd.com           0          0
8   KTN2624  salman.abbas@abcd.com           0          1
9   DEV2698                   <NA>           0          0
10  ITI2535                   <NA>           0          0
11  DEV2698                   <NA>           1          0
12  HRT2837                   <NA>           0          0
13  ERV2951  giriraj.singh@dkl.com           0          0
14  KTN2542 lokesh.sharma@abcd.com           0          0
15  ANA2813   pooja.pawar@abcd.com           0          0
16  ITI2210 nikita.sharma@abcd.com           0          0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM