使用dplyr的组的不同过滤规则

Question

Sample data: 样本数据：

df <- data.frame(loc.id = rep(1:2, each = 11), 
             x = c(35,51,68,79,86,90,92,93,95,98,100,35,51,68,79,86,90,92,92,93,94,94))

For each loc.id , I want to filter filter out x <= 95 . 对于每个loc.id ，我要过滤掉x <= 95 。

df %>% group_by(loc.id) %>% filter(row_number() <= which.max(x >= 95))

          loc.id   x
          <int> <dbl>
       1      1    35
       2      1    51
       3      1    68
       4      1    79
       5      1    86
       6      1    90
       7      1    92
       8      1    93
       9      1    95
      10      2    35

However, the issue for group 2 all the values are less than 95. Therefore I want to keep all values of x for group 2. However, the above line does not do it. 但是，第2组所有值的问题都小于95。因此，我想保留第2组x所有值。但是，上面的行没有这样做。

Answer 1

Perhaps something like this? 也许像这样？

df %>%
    group_by(loc.id) %>%
    mutate(n = sum(x > 95)) %>%
    filter(n == 0 | (x > 0 & x > 95)) %>%
    ungroup() %>%
    select(-n)
## A tibble: 13 x 2
#   loc.id     x
#    <int> <dbl>
# 1      1   98.
# 2      1  100.
# 3      2   35.
# 4      2   51.
# 5      2   68.
# 6      2   79.
# 7      2   86.
# 8      2   90.
# 9      2   92.
#10      2   92.
#11      2   93.
#12      2   94.
#13      2   94.

Note that removing entries where x <= 95 corresponds to retaining entries where x > 95 (not x >= 95 ). 请注意，删除 x <= 95条目对应于保留 x > 95条目（不是x >= 95 ）。

Answer 2

You can use match to get the first TRUE index and return the length of group if no match is found via the nomatch parameter: 如果没有通过nomatch参数找到匹配项，则可以使用match获取第一个TRUE索引并返回组的长度：

df %>% 
    group_by(loc.id) %>% 
    filter(row_number() <= match(TRUE, x >= 95, nomatch=n()))

# A tibble: 20 x 2
# Groups:   loc.id [2]
#   loc.id     x
#    <int> <dbl>
# 1      1    35
# 2      1    51
# 3      1    68
# 4      1    79
# 5      1    86
# 6      1    90
# 7      1    92
# 8      1    93
# 9      1    95
#10      2    35
#11      2    51
#12      2    68
#13      2    79
#14      2    86
#15      2    90
#16      2    92
#17      2    92
#18      2    93
#19      2    94
#20      2    94

Or reverse cumsum as filter condition: 或将cumsum取反作为过滤条件：

df %>% group_by(loc.id) %>% filter(!lag(cumsum(x >= 95), default=FALSE))

Answer 3

A solution using all along with dplyr package can be achieved as: 使用的溶液all连同dplyr封装能够被实现为：

library(dplyr)
df %>% group_by(loc.id) %>%
  filter((x > 95) | all(x<=95))  # All x in group are <= 95 OR x > 95

# # Groups: loc.id [2]
# loc.id     x
# <int> <dbl>
# 1      1  98.0
# 2      1 100  
# 3      2  35.0
# 4      2  51.0
# 5      2  68.0
# 6      2  79.0
# 7      2  86.0
# 8      2  90.0
# 9      2  92.0
# 10      2  92.0
# 11      2  93.0
# 12      2  94.0
# 13      2  94.0

使用dplyr的组的不同过滤规则

问题描述

3 个解决方案

解决方案1
2 已采纳 2018-05-30 21:36:50

解决方案2
0 2018-05-30 21:39:55

解决方案3
0 2018-05-30 21:48:00

使用dplyr的组的不同过滤规则

问题描述

3 个解决方案

解决方案1 2 已采纳 2018-05-30 21:36:50

解决方案2 0 2018-05-30 21:39:55

解决方案3 0 2018-05-30 21:48:00

解决方案1
2 已采纳 2018-05-30 21:36:50

解决方案2
0 2018-05-30 21:39:55

解决方案3
0 2018-05-30 21:48:00