简体   繁体   中英

Filter only rows that are duplicated using dplyr

I have been trying for a while now to solve a problem close to the one as presented at this issue with no success. This consists in filtering for items that are duplicated in a group, but also considering the original one used for comparison with dplyr ( I prefer dplyr over base or data.table ).

The solution I tried is as follows:

> a <- data.frame(name=c("a","b","b","b","a","a"),position=c(1,2,1,2,2,2),achieved=c(1,0,0,0,1,0))
> a %>% group_by(name,achieved) %>% mutate(duplicated=duplicated(position))
# A tibble: 6 x 4
# Groups:   name, achieved [3]
  name  position achieved duplicated
  <fct>    <dbl>    <dbl> <lgl>     
1 a            1        1 FALSE     
2 b            2        0 FALSE     
3 b            1        0 FALSE     
4 b            2        0 TRUE      
5 a            2        1 FALSE     
6 a            2        0 FALSE

I know that this solution is close to the one I desire, but it only brings me the values that are duplicated after the first one , but I would also like a dplyr solution that gives me all duplicate values per group , so probably this could help me improve my dplyr understanding.

The desired output would be as follows:

# A tibble: 6 x 4
# Groups:   name, achieved [3]
  name  position achieved duplicated
  <fct>    <dbl>    <dbl> <lgl>     
1 a            1        1 FALSE     
2 b            2        0 TRUE      
3 b            1        0 FALSE     
4 b            2        0 TRUE      
5 a            2        1 FALSE     
6 a            2        0 FALSE

Thanks in advance.

Try this:

a %>%
  group_by_all() %>%
  mutate(duplicated = n() > 1)

It seems like you want to group by all of name, position, and acheived and then just see if there are more than one record in that group

a %>% group_by(name,achieved, position) %>% mutate(duplicated = n()>1)

#   name  position achieved duplicated
#  <fct>    <dbl>    <dbl> <lgl>     
# 1 a            1        1 FALSE     
# 2 b            2        0 TRUE      
# 3 b            1        0 FALSE     
# 4 b            2        0 TRUE      
# 5 a            2        1 FALSE     
# 6 a            2        0 FALSE  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM