简体   繁体   中英

(R, dplyr) How to aggregate-window data where rows must be conditionally included?

I've googled around, but have not found anything similar to this, but I'm hoping what I'm trying to do has already been done by someone else before.

  1. I have a set of data with timestamps.

  2. I need a running cumulative count of transactions per second - calculated as a true rolling second window. Would be nice to just truncate / round off to nearest second but that wont be enough for my use case.

#Timestamp Current TPS
00:00:00.1 1 ................................................................................................
00:00:00.2 2
00:00:00.3 3
00:00:00.4 4
00:00:00.5 5
00:00:00.6 6
00:00:00.7 7
00:00:00.8 8
00:00:00.9 9
00:00:01.0 10 ....................................10 TPS here............................................................
00:00:01.1 10
00:00:01.2 10 .................................... still 10 TPS here............................................................
00:00:01.4 9 ............ only 9 here, because no event at 00:00:01.3
00:00:01.5 9
00:00:01.5 10
00:00:01.8 8

Initially, I was planning to calculate a time interval difference between rows, but that doesn't solve the question of how to determine which rows should be included or excluded in the aggregate window.

This morning, I thought about mutating a new column that is just the subsecond portion of time. Then, I use that new column as a substraction on the time column, and cumsum it inside a 2nd if_else mutate that does a look-back over last X number of rows?

Does that sound reasonable? Have I overlooked some other/better approach?

library(dplyr)

timestamps <- c("00:00:00.1", "00:00:00.2", "00:00:00.3", "00:00:00.4", "00:00:00.5", "00:00:00.6", "00:00:00.7", "00:00:00.8", "00:00:00.9", "00:00:01.0", "00:00:01.1", "00:00:01.2", "00:00:01.4", "00:00:01.5", "00:00:01.5", "00:00:01.8") %>%
  lubridate::hms %>%     # convert to a time period in hours minutes seconds
  as.numeric  # convert that to a number of seconds

slider::slide_index_dbl(timestamps,
            timestamps,
            ~length(.x),   # = how many timestamps are in the window
            .before = .99)  # Note: using 1 here gave me an incorrect result, 
            # presumably due to floating point arithmetic errors 
            # https://en.wikipedia.org/wiki/Floating-point_error_mitigation
[1]  1  2  3  4  5  6  7  8  9 10 10 10  9 10 10  8

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM