简体   繁体   中英

Filter rows based on multiple conditions using dplyr

df <- data.frame(loc.id = rep(1:2,each = 10), threshold = rep(1:10,times = 2))

I want to filter out the first rows when threshold >= 2 and threshold is >= 4 for each loc.id . I did this:

df %>% group_by(loc.id) %>% dplyr::filter(row_number() == which.max(threshold >= 2),row_number() == which.max(threshold >= 4))

I expected a dataframe like this:

      loc.id threshold
        1       2
        1       4
        2       2
        2       4

But it returns me an empty dataframe

Based on the condition, we can slice the rows from concatenating the two which.max index, get the unique (if there are only cases where threshold is greater than 4, then both the conditions get the same index)

df %>%
    group_by(loc.id) %>%
    filter(any(threshold >= 2)) %>% # additional check
    #slice(unique(c(which.max(threshold > 2), which.max(threshold > 4))))
    # based on the expected output
    slice(unique(c(which.max(threshold >= 2), which.max(threshold >= 4))))
# A tibble: 4 x 2
# Groups:   loc.id [2]
#  loc.id threshold
#   <int>     <int>
#1      1         2
#2      1         4
#3      2         2
#4      2         4

Note that there can be groups where there are no values in threshold greater than or equal to 2. We could keep only those groups

If this isn't what you want, assign the df below a name and use it to filter your dataset.

df %>% 
  distinct() %>% 
  filter(threshold ==2 | threshold==4)
#>   loc.id threshold
#> 1      1         2
#> 2      1         4
#> 3      2         2
#> 4      2         4
```

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM