This feels like a common enough task that I assume there's an established function/method for accomplishing it. I'm imagining a function like dplyr::filter_after()
but there doesn't seem to be one.
Here's the method I'm using as a starting point:
#Setup:
library(dplyr)
threshold <- 3
test.df <- data.frame("num"=c(1:5,1:5),"let"=letters[1:10])
#Drop every row that follows the first 3, including that row:
out.df <- test.df %>%
mutate(pastThreshold = cumsum(num>=threshold)) %>%
filter(pastThreshold==0) %>%
dplyr::select(-pastThreshold)
This produces the desired output:
> out.df
num let
1 1 a
2 2 b
Is there another solution that's less verbose?
You can do:
test.df %>%
slice(1:which.max(num == threshold)-1)
num let
1 1 a
2 2 b
We can use the same in filter
without the need for creating extra column and later removing it
library(dplyr)
test.df %>%
filter(cumsum(num>=threshold) == 0)
# num let
#1 1 a
#2 2 b
Or another option is match
with slice
test.df %>%
slice(seq_len(match(threshold-1, num)))
Or another option is rleid
library(data.table)
test.df %>%
filter(rleid(num >= threshold) == 1)
dplyr
provides the window functions cumany
and cumall
, that filter all rows after/before a condition becomes false for the first time. Documentation .
test.df %>%
filter(cumall(num<threshold)) #all rows until condition violated for first time
# num let
# 1 1 a
# 2 2 b
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.