简体   繁体   中英

Filter a column if another column contains specific set of values using dplyr in R

In the following data frame, I want to filter for the group that contains person "a", "b", and "c":

df <- structure(list(group = c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4), 
person = structure(c(1L, 2L, 1L, 3L, 1L, 2L, 3L, 1L, 1L, 
2L, 3L, 4L), .Label = c("a", "b", "c", "e"), class = "factor")), .Names = 
c("group", 
"person"), row.names = c(NA, -12L), class = "data.frame")

We can use data.table . Convert the 'data.frame' to 'data.table' ( setDT(df) ), grouped by 'group', we get the logical index by checking whether all the 'a', 'b', 'c' elements are %in% 'person' to get the Subset of Data.table ( .SD )

library(data.table)
setDT(df)[, .SD[all(c('a', 'b', 'c') %in% person)], group]

Or with dplyr , using the same methodology after grouping by 'person'

df %>%
   group_by(group) %>%
   filter(all(c('a', 'b', 'c') %in% person))

Or with base R

v1 <- rowSums(table(df)[, c('a', 'b', 'c')]>0)==3
subset(df, group %in% names(v1)[v1])

Update

If we want to return only 2 group using dplyr

df %>% 
    group_by(group) %>%
    filter(all(c('a', 'b', 'c') %in% person), all(person %in% c('a', 'b', 'c')))

Or with n_distinct

df %>%
   group_by(group) %>%
   filter(all(c('a', 'b', 'c') %in% person), n_distinct(person)==3)

Or with data.table

setDT(df)[, .SD[all(c('a', 'b', 'c') %in% person) & uniqueN(person)==3], group]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM