简体   繁体   English

使用dplyr根据多个条件筛选行

[英]Filter rows based on multiple conditions using dplyr

df <- data.frame(loc.id = rep(1:2,each = 10), threshold = rep(1:10,times = 2))

I want to filter out the first rows when threshold >= 2 and threshold is >= 4 for each loc.id . 我希望在threshold > = 2时筛选出第一行,并且每个loc.id threshold > = 4。 I did this: 我这样做了:

df %>% group_by(loc.id) %>% dplyr::filter(row_number() == which.max(threshold >= 2),row_number() == which.max(threshold >= 4))

I expected a dataframe like this: 我期待这样的数据帧:

      loc.id threshold
        1       2
        1       4
        2       2
        2       4

But it returns me an empty dataframe 但它给我一个空的数据帧

Based on the condition, we can slice the rows from concatenating the two which.max index, get the unique (if there are only cases where threshold is greater than 4, then both the conditions get the same index) 根据条件,我们可以将连接两个which.max索引的行slice ,得到unique (如果只有阈值大于4的情况,那么两个条件都得到相同的索引)

df %>%
    group_by(loc.id) %>%
    filter(any(threshold >= 2)) %>% # additional check
    #slice(unique(c(which.max(threshold > 2), which.max(threshold > 4))))
    # based on the expected output
    slice(unique(c(which.max(threshold >= 2), which.max(threshold >= 4))))
# A tibble: 4 x 2
# Groups:   loc.id [2]
#  loc.id threshold
#   <int>     <int>
#1      1         2
#2      1         4
#3      2         2
#4      2         4

Note that there can be groups where there are no values in threshold greater than or equal to 2. We could keep only those groups 请注意,可能存在阈值大于或等于2的组。我们只能保留这些组

If this isn't what you want, assign the df below a name and use it to filter your dataset. 如果这不是您想要的,请在名称下方指定df并使用它来过滤数据集。

df %>% 
  distinct() %>% 
  filter(threshold ==2 | threshold==4)
#>   loc.id threshold
#> 1      1         2
#> 2      1         4
#> 3      2         2
#> 4      2         4
```

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM