简体   繁体   中英

How to calculate a time period until a condition is matched

I need to calculate a time of consecutive dates, until the difference of time between two consecutive dates is greater than 13 seconds.

For example, in the data frame create with the code shown below, the column test has the time difference between the dates. What I need is events of time between lines with test > 13 seconds.

# Create a vector of dates with a random time difference in seconds between records
dates <- seq(as.POSIXct("2020-01-01 00:00:02"), as.POSIXct("2020-01-02 00:00:02"), by = "2 sec")
dates <- dates + sample(15, length(dates), replace = T)

# Create a data.frame
data <- data.frame(id = 1:length(dates), dates = dates)

# Create a test field with the time difference between each date and the next
data$test <- c(diff(data$dates, lag = 1), 0)

# Delete the zero and negative time
data <- data[data$test > 0, ]

head(data)

What I want is something like this:

在此处输入图片说明

To get to your desired result we need to define 'blocks' of observation. Each block is splitted where test is greater than 13.
We start identifying the split_point , and then using the rle function we can assign an ID to each block. Then we can filter out the split_point , and summarize the remaining blocks. Once with the sum of seconds, then with the min of the event dates.

split_point <- data$test <=13
# Find continuous blocks
block_str <- rle(split_point)
# Create block IDs
data$block <- rep(seq_along(block_str$lengths), block_str$lengths)
data <- data[split_point, ] # Remove split points

# Summarize
final_df <- aggregate(test ~ block, data = data, FUN = sum)
dtevent <- aggregate(dates ~ block, data= data, FUN=min)

# Join the two summaries
final_df$DatetimeEvent <- dtevent$dates

head(final_df)
#>   block test       DatetimeEvent
#> 1     1 101  2020-01-01 00:00:09
#> 2     3 105  2020-01-01 00:01:11
#> 3     5 277  2020-01-01 00:02:26
#> 4     7  46  2020-01-01 00:04:58
#> 5     9  27  2020-01-01 00:05:30
#> 6    11 194  2020-01-01 00:05:44

Created on 2020-04-02 by the reprex package (v0.3.0)

Using dplyr for convenience sake:

library(dplyr)

final_df <- data %>%
  mutate(split_point = test <= 13,
         block = with(rle(split_point), rep(seq_along(lengths), lengths))) %>%
  group_by(block) %>%
  filter(split_point) %>%
  summarise(DateTimeEvent = min(dates), TotalTime = sum(test))

final_df
#> # A tibble: 1,110 x 3
#>    block DateTimeEvent       TotalTime
#>    <int> <dttm>              <drtn>   
#>  1     1 2020-01-01 00:00:06 260 secs 
#>  2     3 2020-01-01 00:02:28 170 secs 
#>  3     5 2020-01-01 00:04:11 528 secs 
#>  4     7 2020-01-01 00:09:07  89 secs 
#>  5     9 2020-01-01 00:10:07  37 secs 
#>  6    11 2020-01-01 00:10:39 135 secs 
#>  7    13 2020-01-01 00:11:56  50 secs 
#>  8    15 2020-01-01 00:12:32 124 secs 
#>  9    17 2020-01-01 00:13:52  98 secs 
#> 10    19 2020-01-01 00:14:47  83 secs 
#> # … with 1,100 more rows

Created on 2020-04-02 by the reprex package (v0.3.0)

(results are different because reprex recreates the data each time)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM