简体   繁体   中英

Find three consecutive numbers greater than threshold group-wise in R

How can I get the index of the sample whose previous samples were consecutive and were greater than a fixed threshold in groups?

In the below example, I need to find the time when I have consecutively 3 samples whose speed is greater than 35 speed >= 35 group-wise

speed_threshold = 35

Group   Time    Speed
1       5       25
1       10      23
1       15      21
1       20      40 # Speed > 35
1       25      42 # Speed > 35
1       30      52 # Speed > 35
1       35      48 # <--- Return time = 35 as answer for Group 1 !
1       40      45
2       5       22
2       10      36 # Speed > 35
2       15      38 # Speed > 35
2       20      46 # Speed > 35
2       25      53 # <--- Return time = 25 as answer for Group 2 !
3       5       45
3       10      58 # <--- Return time = NA as answer for group 3 !

If it's above the threshold and it's the third such value in a row, capture the index in ends . Select the first index in ends and add one to get the index of the return time. (There may be more than 1 such group of 3 and therefore more than one element of ends . In this case, the first end needs to be used.)

Note: In your example, the speed at return time is always above the threshold. This code does not check that as a condition at all, but simply gives the first time after three rows with speeds above threshold (regardless of whether the speed at that time is still above the threshold).


speed_thresh <- 35

df[, {above <- Speed > speed_thresh
      ends <- which(above & rowid(rleid(above)) == 3)
      .(Return_Time = Time[ends[1] + 1])}
   , Group]

#    Group Return_Time
# 1:     1          35
# 2:     2          25
# 3:     3          NA

Data used:

df <- fread('
Group   Time    Speed
1       5       25
1       10      23
1       15      21
1       20      40 
1       25      42 
1       30      52 
1       35      48 
1       40      45
2       5       22
2       10      36 
2       15      38 
2       20      46 
2       25      53 
3       5       45
3       10      58 

One option is to use rleid to create a grouping variable based on the logic in 'Speed' and filter the rows where the number of rows ( n() ) is equal to 3 and all 'Speed' is greater than 35

df1 %>%
  group_by(Group, grp = rleid(Speed > speed_threshold)) %>%
  filter(n() >= 3, all(Speed > speed_threshold)) %>%

1) Using DF defined reproducibly in the Note at the end, define a function ok which takes a vector of logicals indicating whether speed is greater than 35 and returns a logical vector of the same length which is TRUE for the first speed that comes after 3 consecutive TRUEs. Apply that to each group using ave and subset DF down those rows which are TRUE giving s .

If just returning the groups which satisfy the condition is sufficient then we are done; otherwise, define Groups as a one column data frame with one row per Group and merge that with s so that we get an NA for those groups not satisfying the condition.


ok <- function(x) cumsum(rollapplyr(x, list(-(1:3)), all, fill = FALSE)) == 1
s <- subset(DF, ave(Speed > 35, Group, FUN = ok))

Groups <- data.frame(Group = unique(DF$Group))
merge(Groups, s, all.x = TRUE)[1:2]
##   Group Time
## 1     1   35
## 2     2   25
## 3     3   NA

2) A second approach is to split DF by group and then perform the calculation over each component of the split.


calc <- function(x) {
  r <- rollapplyr(x$Speed > 35, list(-(1:3)), all, fill = FALSE)
  c(which(cumsum(r) == 1), NA)[1]
sapply(split(DF, DF$Group), calc)
##  1  2  3 
## 35 25 NA 


Lines <- "Group   Time    Speed
1       5       25
1       10      23
1       15      21
1       20      40 # Speed > 35
1       25      42 # Speed > 35
1       30      52 # Speed > 35
1       35      48 # <--- Return time = 35 as answer for Group 1 !
1       40      45
2       5       22
2       10      36 # Speed > 35
2       15      38 # Speed > 35
2       20      46 # Speed > 35
2       25      53 # <--- Return time = 25 as answer for Group 2 !
3       5       45
3       10      58 # <--- Return time = NA as answer for group 3 !"
DF <- read.table(text = Lines, header = TRUE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM