[英]Group all rows that meet a certain condition
I have the following dataframe df1
:我有以下数据
df1
:
company_location count
<chr> <int>
1 DE 28
2 JP 6
3 GB 47
4 HN 1
5 US 355
6 HU 1
I want to get to df2
:我想去
df2
:
company_location count
<chr> <int>
1 DE 28
2 GB 47
3 US 355
4 OTHER 8
df2
is the same as df1
but sums together all the columns with count<10
and aggregates them in a row called OTHER
df2
与df1
相同,但将count<10
的所有列汇总在一起,并将它们聚合到名为OTHER
的行中
Does something like this exist: A group_by() function that only groups all the rows that match a particular condition into one group and leaves all the other rows in groups only containing them alone?是否存在这样的事情:一个 group_by() 函数,它只将与特定条件匹配的所有行分组到一个组中,并将所有其他行留在仅包含它们的组中?
This is what fct_lump_min
is for - it's a function from forcats
, which is part of the tidyverse.这就是
fct_lump_min
的用途——它是来自forcats
的函数,它是 tidyverse 的一部分。
library(tidyverse)
df %>%
group_by(company_location = fct_lump_min(company_location, 10, count)) %>%
summarise(count = sum(count))
#> # A tibble: 4 x 2
#> company_location count
#> <fct> <int>
#> 1 DE 28
#> 2 GB 47
#> 3 US 355
#> 4 Other 8
Make a temporary variable regrouping company_location
based on count
, then summarise
:根据
count
创建一个临时变量重新组合company_location
,然后summarise
:
library(dplyr)
df1 %>%
group_by(company_location = replace(company_location, count < 10, 'OTHER')) %>%
summarise(count = sum(count))
# company_location count
# <chr> <int>
#1 DE 28
#2 GB 47
#3 OTHER 8
#4 US 355
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.