简体   繁体   中英

Find three consecutive numbers greater than threshold group-wise in R

How can I get the index of the sample whose previous samples were consecutive and were greater than a fixed threshold in groups?

In the below example, I need to find the time when I have consecutively 3 samples whose speed is greater than 35 speed >= 35 group-wise

speed_threshold = 35

Group   Time    Speed
1       5       25
1       10      23
1       15      21
1       20      40 # Speed > 35
1       25      42 # Speed > 35
1       30      52 # Speed > 35
1       35      48 # <--- Return time = 35 as answer for Group 1 !
1       40      45
2       5       22
2       10      36 # Speed > 35
2       15      38 # Speed > 35
2       20      46 # Speed > 35
2       25      53 # <--- Return time = 25 as answer for Group 2 !
3       5       45
3       10      58 # <--- Return time = NA as answer for group 3 !

If it's above the threshold and it's the third such value in a row, capture the index in ends . Select the first index in ends and add one to get the index of the return time. (There may be more than 1 such group of 3 and therefore more than one element of ends . In this case, the first end needs to be used.)

Note: In your example, the speed at return time is always above the threshold. This code does not check that as a condition at all, but simply gives the first time after three rows with speeds above threshold (regardless of whether the speed at that time is still above the threshold).

library(data.table)
setDT(df)

speed_thresh <- 35

df[, {above <- Speed > speed_thresh
      ends <- which(above & rowid(rleid(above)) == 3)
      .(Return_Time = Time[ends[1] + 1])}
   , Group]


#    Group Return_Time
# 1:     1          35
# 2:     2          25
# 3:     3          NA

Data used:

df <- fread('
Group   Time    Speed
1       5       25
1       10      23
1       15      21
1       20      40 
1       25      42 
1       30      52 
1       35      48 
1       40      45
2       5       22
2       10      36 
2       15      38 
2       20      46 
2       25      53 
3       5       45
3       10      58 
')

One option is to use rleid to create a grouping variable based on the logic in 'Speed' and filter the rows where the number of rows ( n() ) is equal to 3 and all 'Speed' is greater than 35

library(dplyr)
library(data.table)
df1 %>%
  group_by(Group, grp = rleid(Speed > speed_threshold)) %>%
  filter(n() >= 3, all(Speed > speed_threshold)) %>%
  slice(1:3)

1) Using DF defined reproducibly in the Note at the end, define a function ok which takes a vector of logicals indicating whether speed is greater than 35 and returns a logical vector of the same length which is TRUE for the first speed that comes after 3 consecutive TRUEs. Apply that to each group using ave and subset DF down those rows which are TRUE giving s .

If just returning the groups which satisfy the condition is sufficient then we are done; otherwise, define Groups as a one column data frame with one row per Group and merge that with s so that we get an NA for those groups not satisfying the condition.

library(zoo)

ok <- function(x) cumsum(rollapplyr(x, list(-(1:3)), all, fill = FALSE)) == 1
s <- subset(DF, ave(Speed > 35, Group, FUN = ok))

Groups <- data.frame(Group = unique(DF$Group))
merge(Groups, s, all.x = TRUE)[1:2]
##   Group Time
## 1     1   35
## 2     2   25
## 3     3   NA

2) A second approach is to split DF by group and then perform the calculation over each component of the split.

library(zoo)

calc <- function(x) {
  r <- rollapplyr(x$Speed > 35, list(-(1:3)), all, fill = FALSE)
  c(which(cumsum(r) == 1), NA)[1]
}
sapply(split(DF, DF$Group), calc)
##  1  2  3 
## 35 25 NA 

Note

Lines <- "Group   Time    Speed
1       5       25
1       10      23
1       15      21
1       20      40 # Speed > 35
1       25      42 # Speed > 35
1       30      52 # Speed > 35
1       35      48 # <--- Return time = 35 as answer for Group 1 !
1       40      45
2       5       22
2       10      36 # Speed > 35
2       15      38 # Speed > 35
2       20      46 # Speed > 35
2       25      53 # <--- Return time = 25 as answer for Group 2 !
3       5       45
3       10      58 # <--- Return time = NA as answer for group 3 !"
DF <- read.table(text = Lines, header = TRUE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM