事件数据起止

Question

I have a data frame with datetimes and values, like so: 我有一个包含日期时间和值的数据框，如下所示：

             datetime value
1 2016-05-03 08:51:41     0
2 2016-05-03 10:36:24     0
3 2016-05-03 10:36:32     9
4 2016-05-03 10:45:01     5
5 2016-05-03 10:45:24     0
6 2016-05-03 19:37:02     0
7 2016-05-03 19:37:06     7
8 2016-05-03 19:48:38     0

What I would like is a table that contains start and stop times for periods over which the value was constant. 我想要的是一个表，其中包含值恒定的时间段的开始和结束时间。 For the table above the expected output is the following: 对于上面的表，预期输出如下：

  value               start                stop
1     0                <NA> 2016-05-03 10:36:32
2     9 2016-05-03 10:36:32 2016-05-03 10:45:01
3     5 2016-05-03 10:45:01 2016-05-03 10:45:24
4     0 2016-05-03 10:45:24 2016-05-03 19:37:06
5     7 2016-05-03 19:37:06 2016-05-03 19:48:38
6     0 2016-05-03 19:48:38                <NA>

dput of the original table 原始表的输出

structure(list(datetime = structure(c(1462258301, 1462264584, 
1462264592, 1462265101, 1462265124, 1462297022, 1462297026, 1462297718
), class = c("POSIXct", "POSIXt"), tzone = ""), value = c(0, 
0, 9, 5, 0, 0, 7, 0)), class = "data.frame", row.names = c(NA, 
-8L), .Names = c("datetime", "value"))

Answer 1

Using data.table... 使用data.table ...

library(data.table)
setDT(DF)

res = DF[, .(end = datetime[.N]), by=.(value, seq = rleid(value))]
res[.N, end := NA]

   value seq                 end
1:     0   1 2016-05-03 04:36:24
2:     9   2 2016-05-03 04:36:32
3:     5   3 2016-05-03 04:45:01
4:     0   4 2016-05-03 13:37:02
5:     7   5 2016-05-03 13:37:06
6:     0   6                <NA>

I would stop at this point, since it is redundant to add the start column. 我将在此处停止，因为添加start列是多余的。 If you really want it: 如果您真的想要它：

res[, start := shift(end)]
setcolorder(res, c("value", "seq", "start", "end"))


   value seq               start                 end
1:     0   1                <NA> 2016-05-03 04:36:24
2:     9   2 2016-05-03 04:36:24 2016-05-03 04:36:32
3:     5   3 2016-05-03 04:36:32 2016-05-03 04:45:01
4:     0   4 2016-05-03 04:45:01 2016-05-03 13:37:02
5:     7   5 2016-05-03 13:37:02 2016-05-03 13:37:06
6:     0   6 2016-05-03 13:37:06                <NA>

How it works: 这个怎么运作：

DT[i, j, by] filters to i and then computes j in each subset determined in by DT[i, j, by]过滤到i ，然后在by确定的每个子集中计算j
.() is just a shortcut to list() .()只是list()的快捷方式
rleid identifies each "run" of identical values rleid标识每个“运行”的相同值
.N is the number of rows in a by group (or the number of rows in a table if by is blank) .N是“ by组”中的行数（如果“ by为空by则为表中的行数）
:= modifies columns by reference :=通过引用修改列
shift is a lag/lead operator shift是滞后/超前运算符
setcolorder rearranges columns by reference setcolorder通过引用重新排列列

(Note that my result doesn't look like the OP's, either because the wrong dput was given or because POSIX datetime objects are incredibly finicky. I recommend IDateTime from the data.table package instead.) （请注意，我的结果看起来并不像OP的，或者是因为错误的dput给予或因为POSIX datetime对象是令人难以置信挑剔的。我建议IDateTime从data.table包来替代。）

Answer 2

Let's assume your first dataframe is named x . 假设您的第一个数据框名为x 。 Then do: data.frame(value=names(tapply(x$datetime, x$value, min)), start=tapply(x$datetime, x$value, max), stop=tapply(x$datetime, x$value, max)) 然后做： data.frame(value=names(tapply(x$datetime, x$value, min)), start=tapply(x$datetime, x$value, max), stop=tapply(x$datetime, x$value, max))

事件数据起止

问题描述

2 个解决方案

解决方案1
5 已采纳 2016-05-04 15:49:47

解决方案2
0 2016-05-04 15:39:56

事件数据起止

问题描述

2 个解决方案

解决方案1 5 已采纳 2016-05-04 15:49:47

解决方案2 0 2016-05-04 15:39:56

解决方案1
5 已采纳 2016-05-04 15:49:47

解决方案2
0 2016-05-04 15:39:56