R - 过滤 dataframe 以仅包含列数符合条件的行

Question

Assume this dataframe:假设这个 dataframe：

country <- c('USA', 'USA', 'USA', 'USA', 'USA', 'UK', 'UK', 'UK', 'Canada')
number <- c(1:9)
df <- data.frame(country, number)

I want to be able to subset only the rows where the country count is greater than 4 or less than 2. So in this case, it would return:我希望能够仅对国家计数大于 4 或小于 2 的行进行子集化。所以在这种情况下，它将返回：

country  number
USA      1
USA      2
USA      3
USA      4
USA      5
Canada   9

I am able to make it work with this:我能够使它与此一起使用：

totalcounts <- filter(count(df, country), n>4 | n<2) # giving me a df of the country and count
for (i in nrow(totalcounts)){
  # code in here that rbinds rows as it matches
}

But I feel there has to be an easier way.但我觉得必须有一个更简单的方法。 I haven't gotten the grasp of sapply and such yet, so I feel like I'm missing something here.我还没有掌握 sapply 之类的东西，所以我觉得我在这里遗漏了一些东西。 It just seems like I am going the long way around and there is already something in place that does this.似乎我正在走很长的路，并且已经有一些东西可以做到这一点。

Answer 1

Here is a base R option using subset + ave这是使用subset + ave的基本 R 选项

subset(df,!ave(number,country,FUN = function(x) length(x)%in% c(2:4)))

or a shorter version (Thank @Onyambu)或更短的版本（感谢@Onyambu）

subset(df,!ave(number,country,FUN = length) %in% 2:4)

such that这样

  country number
1     USA      1
2     USA      2
3     USA      3
4     USA      4
5     USA      5
9  Canada      9

Answer 2

We can do a group by filter我们可以按过滤器分组

library(dplyr)
df %>% 
   group_by(country) %>% 
   filter(n() > 4|n() < 2)
# A tibble: 6 x 2
# Groups:   country [2]
#  country number
#  <chr>    <int>
#1 USA          1
#2 USA          2
#3 USA          3
#4 USA          4
#5 USA          5
#6 Canada       9

Or another option is to create a column of counts with add_count and filter或者另一种选择是使用add_count和filter创建一列计数

df %>%
    add_count(country) %>% 
    filter(n > 4|n < 2) %>% 
    select(-n)

Or do a join if we use the count或者如果我们使用count进行连接

df %>%
    count(country) %>% 
    filter(n >4 |n <2) %>% 
    select(country) %>% 
    inner_join(df)

Answer 3

Base R option using table :使用table的基本 R 选项：

tab <- table(df$country)
subset(df, country %in% names(tab[tab > 4 | tab < 2]))

#  country number
#1     USA      1
#2     USA      2
#3     USA      3
#4     USA      4
#5     USA      5
#9  Canada      9

R - 过滤 dataframe 以仅包含列数符合条件的行

问题描述

3 个解决方案

解决方案1
7 2020-07-31 22:55:52

解决方案2
3 已采纳 2020-07-31 22:41:46

解决方案3
3 2020-08-01 00:53:38

R - 过滤 dataframe 以仅包含列数符合条件的行

问题描述

3 个解决方案

解决方案1 7 2020-07-31 22:55:52

解决方案2 3 已采纳 2020-07-31 22:41:46

解决方案3 3 2020-08-01 00:53:38

解决方案1
7 2020-07-31 22:55:52

解决方案2
3 已采纳 2020-07-31 22:41:46

解决方案3
3 2020-08-01 00:53:38