简体   繁体   English

按组过滤数据并保留空组

[英]Filter data by group & preserve empty groups

I wonder how can I filter my data by group, and preserve the groups that are empty ?我想知道如何按组filter我的数据,并保留空的组

Example:例子:

year = c(1,2,3,1,2,3,1,2,3)
site = rep(c("a", "b", "d"), each = 3)
value = c(3,3,0,1,8,5,10,18,27)

df <- data.frame(year, site, value)

I want to subset the rows where the value is more than 5. For some groups, this is never true.我想对value大于 5 的行进行子集化。对于某些组,这从来都不是真的。 Filter function simply skips empty groups. Filter功能只是跳过空组。

How can I keep my empty groups and have NA instead?如何保留我的空组并改为使用 NA? Ideally, I would like to use dplyr funtions instead of base R.理想情况下,我想使用dplyr而不是base R。

My filtering approach, where .preserve does not preserve empty groups:我的过滤方法,其中.preserve不保留空组:

df %>% 
  group_by(site) %>% 
  filter(value > 5, .preserve = TRUE) 

Expected output:预期输出:

    year site  value
  <dbl> <fct> <dbl>
1    NA a        NA
2     2 b         8
3     1 d        10
4     2 d        18
5     3 d        27

With the addition of tidyr , you can do:添加tidyr ,您可以执行以下操作:

df %>% 
 group_by(site) %>% 
 filter(value > 5) %>%
 ungroup() %>%
 complete(site = df$site)

  site   year value
  <fct> <dbl> <dbl>
1 a        NA    NA
2 b         2     8
3 d         1    10
4 d         2    18
5 d         3    27

Or if you want to keep it in dplyr :或者,如果您想将其保留在dplyr

df %>% 
 group_by(site) %>% 
 filter(value > 5) %>%
 bind_rows(df %>% 
            group_by(site) %>% 
            filter(all(value <= 5)) %>%
            summarise_all(~ NA))

Using the nesting functionality of tidyr and applying purrr::map使用tidyr的嵌套功能并应用purrr::map

df %>% 
  group_by(site) %>% 
  tidyr::nest() %>% 
  mutate(data = purrr::map(data, . %>% filter(value > 5))) %>% 
  tidyr::unnest(cols=c(data), keep_empty = TRUE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM