（R，dplyr）如何聚合必须有条件地包含行的窗口数据？

Question

I've googled around, but have not found anything similar to this, but I'm hoping what I'm trying to do has already been done by someone else before.我已经用谷歌搜索了，但没有发现任何类似的东西，但我希望我正在尝试做的事情之前已经被其他人做过了。

I have a set of data with timestamps.我有一组带有时间戳的数据。
I need a running cumulative count of transactions per second - calculated as a true rolling second window.我需要每秒运行的事务累积计数 - 计算为真正的滚动秒 window。 Would be nice to just truncate / round off to nearest second but that wont be enough for my use case.将截断/四舍五入到最接近的秒会很好，但这对于我的用例来说还不够。

#Timestamp #时间戳	Current TPS当前 TPS
00:00:00.1 00:00:00.1	1 1	................................................................................................ ..................................................... ..................................................
00:00:00.2 00:00:00.2	2 2
00:00:00.3 00:00:00.3	3 3
00:00:00.4 00:00:00.4	4 4
00:00:00.5 00:00:00.5	5 5
00:00:00.6 00:00:00.6	6 6
00:00:00.7 00:00:00.7	7 7
00:00:00.8 00:00:00.8	8 8
00:00:00.9 00:00:00.9	9 9
00:00:01.0 00:00:01.0	10 10	....................................10 TPS here............................................................ .....................10 TPS 这里............ ..................................................................
00:00:01.1 00:00:01.1	10 10
00:00:01.2 00:00:01.2	10 10	.................................... still 10 TPS here............................................................ .................................. 仍然是 10 TPS ............ .....................................................
00:00:01.4 00:00:01.4	9 9	............ only 9 here, because no event at 00:00:01.3 ......这里只有 9 个，因为 00:00:01.3 没有事件
00:00:01.5 00:00:01.5	9 9
00:00:01.5 00:00:01.5	10 10
00:00:01.8 00:00:01.8	8 8

Initially, I was planning to calculate a time interval difference between rows, but that doesn't solve the question of how to determine which rows should be included or excluded in the aggregate window.最初，我计划计算行之间的时间间隔差，但这并不能解决如何确定应该在聚合 window 中包含或排除哪些行的问题。

This morning, I thought about mutating a new column that is just the subsecond portion of time.今天早上，我想改变一个新的列，它只是时间的亚秒部分。 Then, I use that new column as a substraction on the time column, and cumsum it inside a 2nd if_else mutate that does a look-back over last X number of rows?然后，我使用该新列作为时间列的减法，并在第二个 if_else 变异中对其进行累积，该变异对最后 X 行进行回顾？

Does that sound reasonable?这听起来合理吗？ Have I overlooked some other/better approach?我是否忽略了其他/更好的方法？

Answer 1

library(dplyr)

timestamps <- c("00:00:00.1", "00:00:00.2", "00:00:00.3", "00:00:00.4", "00:00:00.5", "00:00:00.6", "00:00:00.7", "00:00:00.8", "00:00:00.9", "00:00:01.0", "00:00:01.1", "00:00:01.2", "00:00:01.4", "00:00:01.5", "00:00:01.5", "00:00:01.8") %>%
  lubridate::hms %>%     # convert to a time period in hours minutes seconds
  as.numeric  # convert that to a number of seconds

slider::slide_index_dbl(timestamps,
            timestamps,
            ~length(.x),   # = how many timestamps are in the window
            .before = .99)  # Note: using 1 here gave me an incorrect result, 
            # presumably due to floating point arithmetic errors 
            # https://en.wikipedia.org/wiki/Floating-point_error_mitigation
[1]  1  2  3  4  5  6  7  8  9 10 10 10  9 10 10  8

（R，dplyr）如何聚合必须有条件地包含行的窗口数据？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-02-08 00:44:49

（R，dplyr）如何聚合必须有条件地包含行的窗口数据？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-02-08 00:44:49

解决方案1
1 已采纳 2021-02-08 00:44:49