对满足特定条件的所有行进行分组

Question

I have the following dataframe df1 :我有以下数据df1 ：

  company_location count
  <chr>            <int>
1 DE                  28
2 JP                   6
3 GB                  47
4 HN                   1
5 US                 355
6 HU                   1

I want to get to df2 :我想去df2 ：

  company_location count
  <chr>            <int>
1 DE                  28
2 GB                  47
3 US                 355
4 OTHER                8

df2 is the same as df1 but sums together all the columns with count<10 and aggregates them in a row called OTHER df2与df1相同，但将count<10的所有列汇总在一起，并将它们聚合到名为OTHER的行中

Does something like this exist: A group_by() function that only groups all the rows that match a particular condition into one group and leaves all the other rows in groups only containing them alone?是否存在这样的事情：一个 group_by() 函数，它只将与特定条件匹配的所有行分组到一个组中，并将所有其他行留在仅包含它们的组中？

Answer 1

This is what fct_lump_min is for - it's a function from forcats , which is part of the tidyverse.这就是fct_lump_min的用途——它是来自forcats的函数，它是 tidyverse 的一部分。

library(tidyverse)

df %>%
  group_by(company_location = fct_lump_min(company_location, 10, count)) %>%
  summarise(count = sum(count))

#> # A tibble: 4 x 2
#>   company_location count
#>   <fct>            <int>
#> 1 DE                  28
#> 2 GB                  47
#> 3 US                 355
#> 4 Other                8

Answer 2

Make a temporary variable regrouping company_location based on count , then summarise :根据count创建一个临时变量重新组合company_location ，然后summarise ：

library(dplyr)
df1 %>% 
    group_by(company_location = replace(company_location, count < 10, 'OTHER')) %>% 
    summarise(count = sum(count))

#  company_location count
#  <chr>            <int>
#1 DE                  28
#2 GB                  47
#3 OTHER                8
#4 US                 355

对满足特定条件的所有行进行分组

问题描述

2 个解决方案

解决方案1
4 2022-06-23 22:34:36

解决方案2
2 2022-06-23 22:33:27

对满足特定条件的所有行进行分组

问题描述

2 个解决方案

解决方案1 4 2022-06-23 22:34:36

解决方案2 2 2022-06-23 22:33:27

解决方案1
4 2022-06-23 22:34:36

解决方案2
2 2022-06-23 22:33:27