简体   繁体   English

对满足特定条件的所有行进行分组

[英]Group all rows that meet a certain condition

I have the following dataframe df1 :我有以下数据df1

  company_location count
  <chr>            <int>
1 DE                  28
2 JP                   6
3 GB                  47
4 HN                   1
5 US                 355
6 HU                   1

I want to get to df2 :我想去df2

  company_location count
  <chr>            <int>
1 DE                  28
2 GB                  47
3 US                 355
4 OTHER                8

df2 is the same as df1 but sums together all the columns with count<10 and aggregates them in a row called OTHER df2df1相同,但将count<10的所有列汇总在一起,并将它们聚合到名为OTHER的行中

Does something like this exist: A group_by() function that only groups all the rows that match a particular condition into one group and leaves all the other rows in groups only containing them alone?是否存在这样的事情:一个 group_by() 函数,它只将与特定条件匹配的所有行分组到一个组中,并将所有其他行留在仅包含它们的组中?

This is what fct_lump_min is for - it's a function from forcats , which is part of the tidyverse.这就是fct_lump_min的用途——它是来自forcats的函数,它是 tidyverse 的一部分。

library(tidyverse)

df %>%
  group_by(company_location = fct_lump_min(company_location, 10, count)) %>%
  summarise(count = sum(count))

#> # A tibble: 4 x 2
#>   company_location count
#>   <fct>            <int>
#> 1 DE                  28
#> 2 GB                  47
#> 3 US                 355
#> 4 Other                8

Make a temporary variable regrouping company_location based on count , then summarise :根据count创建一个临时变量重新组合company_location ,然后summarise

library(dplyr)
df1 %>% 
    group_by(company_location = replace(company_location, count < 10, 'OTHER')) %>% 
    summarise(count = sum(count))

#  company_location count
#  <chr>            <int>
#1 DE                  28
#2 GB                  47
#3 OTHER                8
#4 US                 355

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM