简体   繁体   中英

Filter in group_by + mutate not working as in group_by + summarise in dplyr R

I am having problems filtering when mutating a dataframe using tidyverse :

sample.df <- data.frame(
  group = c("A", "A", "A", "B", "B", "C"),
  value = c(1,2,3,4,5,6)
)

mutated.sample.df <- sample.df %>% 
  dplyr::group_by(group) %>% 
  dplyr::mutate(group_count = n()) 

non.desired.df <- mutated.sample.df %>% 
  dplyr::filter(group_count == max(group_count)) %>% 
  dplyr::select(-group_count)

This returns a non desired solution as no filter has been applied:

 group value
  <fct> <dbl>
1 A         1
2 A         2
3 A         3
4 B         4
5 B         5
6 C         6

On the other hand, working directly with summarizing the filter is properly applied:

summarized.sample.df <- sample.df %>% 
  dplyr::group_by(group) %>% 
  dplyr::summarise(group_count = n()) %>% 
  dplyr::filter(group_count == max(group_count))

giving:

 group group_count
  <fct>       <int>
1 A               3

I could now filter the original dataframe and get my desired df:

desired.df <- sample.df %>% 
  dplyr::filter(group %in% summarized.sample.df$group)

That is:

  group value
  <fct> <dbl>
1 A         1
2 A         2
3 A         3

What am I missing in mutating behaviour? I can filter just "hard coding" the value:

my.max <- max(mutated.sample.df$group_count)
desired.df <- mutated.sample.df %>% 
  dplyr::filter(group_count == my.max) %>% 
  dplyr::select(-group_count)

Would it be possible to obtain the desired.df in one pipe? Both in summarizing and in mutating hard calculating the max I can not.

I would expect this to work, but it does not. Any hint why?

mutated.sample.df <- sample.df %>% 
  dplyr::group_by(group) %>% 
  dplyr::mutate(group_count = n()) %>% 
  dplyr::filter(group_count == max(group_count)) %>% 
  dplyr::select(-group_count)

Thanks

You can do:

sample.df %>%
 add_count(group) %>%
 filter(n == max(n)) %>%
 select(-n)

  group value
1     A     1
2     A     2
3     A     3

The code from @tmfmnk is better, but your pipe isn't working because you forgot to "ungroup()" the dataframe between mutate and filter, so the "filter()" are working within the groups. Try:

mutated.sample.df <- sample.df %>% 
  dplyr::group_by(group) %>% 
  dplyr::mutate(group_count = n()) %>% 
  dplyr::ungroup()  %>% 
  dplyr::filter(group_count == max(group_count)) %>% 
  dplyr::select(-group_count)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM