Filter rows based on multiple conditions using dplyr

Question

df <- data.frame(loc.id = rep(1:2,each = 10), threshold = rep(1:10,times = 2))

I want to filter out the first rows when threshold >= 2 and threshold is >= 4 for each loc.id . I did this:

df %>% group_by(loc.id) %>% dplyr::filter(row_number() == which.max(threshold >= 2),row_number() == which.max(threshold >= 4))

I expected a dataframe like this:

      loc.id threshold
        1       2
        1       4
        2       2
        2       4

But it returns me an empty dataframe

Answer 1

Based on the condition, we can slice the rows from concatenating the two which.max index, get the unique (if there are only cases where threshold is greater than 4, then both the conditions get the same index)

df %>%
    group_by(loc.id) %>%
    filter(any(threshold >= 2)) %>% # additional check
    #slice(unique(c(which.max(threshold > 2), which.max(threshold > 4))))
    # based on the expected output
    slice(unique(c(which.max(threshold >= 2), which.max(threshold >= 4))))
# A tibble: 4 x 2
# Groups:   loc.id [2]
#  loc.id threshold
#   <int>     <int>
#1      1         2
#2      1         4
#3      2         2
#4      2         4

Note that there can be groups where there are no values in threshold greater than or equal to 2. We could keep only those groups

Answer 2

If this isn't what you want, assign the df below a name and use it to filter your dataset.

df %>% 
  distinct() %>% 
  filter(threshold ==2 | threshold==4)
#>   loc.id threshold
#> 1      1         2
#> 2      1         4
#> 3      2         2
#> 4      2         4
```

Filter rows based on multiple conditions using dplyr

Question

2 answers

solution1
2 ACCPTED 2018-06-03 15:00:32

solution2
1 2018-06-03 15:39:54

Filter rows based on multiple conditions using dplyr

Question

2 answers

solution1 2 ACCPTED 2018-06-03 15:00:32

solution2 1 2018-06-03 15:39:54

solution1
2 ACCPTED 2018-06-03 15:00:32

solution2
1 2018-06-03 15:39:54