简体   繁体   中英

Filter data by group & preserve empty groups

I wonder how can I filter my data by group, and preserve the groups that are empty ?

Example:

year = c(1,2,3,1,2,3,1,2,3)
site = rep(c("a", "b", "d"), each = 3)
value = c(3,3,0,1,8,5,10,18,27)

df <- data.frame(year, site, value)

I want to subset the rows where the value is more than 5. For some groups, this is never true. Filter function simply skips empty groups.

How can I keep my empty groups and have NA instead? Ideally, I would like to use dplyr funtions instead of base R.

My filtering approach, where .preserve does not preserve empty groups:

df %>% 
  group_by(site) %>% 
  filter(value > 5, .preserve = TRUE) 

Expected output:

    year site  value
  <dbl> <fct> <dbl>
1    NA a        NA
2     2 b         8
3     1 d        10
4     2 d        18
5     3 d        27

With the addition of tidyr , you can do:

df %>% 
 group_by(site) %>% 
 filter(value > 5) %>%
 ungroup() %>%
 complete(site = df$site)

  site   year value
  <fct> <dbl> <dbl>
1 a        NA    NA
2 b         2     8
3 d         1    10
4 d         2    18
5 d         3    27

Or if you want to keep it in dplyr :

df %>% 
 group_by(site) %>% 
 filter(value > 5) %>%
 bind_rows(df %>% 
            group_by(site) %>% 
            filter(all(value <= 5)) %>%
            summarise_all(~ NA))

Using the nesting functionality of tidyr and applying purrr::map

df %>% 
  group_by(site) %>% 
  tidyr::nest() %>% 
  mutate(data = purrr::map(data, . %>% filter(value > 5))) %>% 
  tidyr::unnest(cols=c(data), keep_empty = TRUE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM