简体   繁体   English

事件数据起止

[英]Event data to start-stop

I have a data frame with datetimes and values, like so: 我有一个包含日期时间和值的数据框,如下所示:

             datetime value
1 2016-05-03 08:51:41     0
2 2016-05-03 10:36:24     0
3 2016-05-03 10:36:32     9
4 2016-05-03 10:45:01     5
5 2016-05-03 10:45:24     0
6 2016-05-03 19:37:02     0
7 2016-05-03 19:37:06     7
8 2016-05-03 19:48:38     0

What I would like is a table that contains start and stop times for periods over which the value was constant. 我想要的是一个表,其中包含值恒定的时间段的开始和结束时间。 For the table above the expected output is the following: 对于上面的表,预期输出如下:

  value               start                stop
1     0                <NA> 2016-05-03 10:36:32
2     9 2016-05-03 10:36:32 2016-05-03 10:45:01
3     5 2016-05-03 10:45:01 2016-05-03 10:45:24
4     0 2016-05-03 10:45:24 2016-05-03 19:37:06
5     7 2016-05-03 19:37:06 2016-05-03 19:48:38
6     0 2016-05-03 19:48:38                <NA>

dput of the original table 原始表的输出

structure(list(datetime = structure(c(1462258301, 1462264584, 
1462264592, 1462265101, 1462265124, 1462297022, 1462297026, 1462297718
), class = c("POSIXct", "POSIXt"), tzone = ""), value = c(0, 
0, 9, 5, 0, 0, 7, 0)), class = "data.frame", row.names = c(NA, 
-8L), .Names = c("datetime", "value"))

Using data.table... 使用data.table ...

library(data.table)
setDT(DF)

res = DF[, .(end = datetime[.N]), by=.(value, seq = rleid(value))]
res[.N, end := NA]

   value seq                 end
1:     0   1 2016-05-03 04:36:24
2:     9   2 2016-05-03 04:36:32
3:     5   3 2016-05-03 04:45:01
4:     0   4 2016-05-03 13:37:02
5:     7   5 2016-05-03 13:37:06
6:     0   6                <NA>

I would stop at this point, since it is redundant to add the start column. 我将在此处停止,因为添加start列是多余的。 If you really want it: 如果您真的想要它:

res[, start := shift(end)]
setcolorder(res, c("value", "seq", "start", "end"))


   value seq               start                 end
1:     0   1                <NA> 2016-05-03 04:36:24
2:     9   2 2016-05-03 04:36:24 2016-05-03 04:36:32
3:     5   3 2016-05-03 04:36:32 2016-05-03 04:45:01
4:     0   4 2016-05-03 04:45:01 2016-05-03 13:37:02
5:     7   5 2016-05-03 13:37:02 2016-05-03 13:37:06
6:     0   6 2016-05-03 13:37:06                <NA>

How it works: 这个怎么运作:

  • DT[i, j, by] filters to i and then computes j in each subset determined in by DT[i, j, by]过滤到i ,然后在by确定的每个子集中计算j
  • .() is just a shortcut to list() .()只是list()的快捷方式
  • rleid identifies each "run" of identical values rleid标识每个“运行”的相同值
  • .N is the number of rows in a by group (or the number of rows in a table if by is blank) .N是“ by组”中的行数(如果“ by为空by则为表中的行数)
  • := modifies columns by reference :=通过引用修改列
  • shift is a lag/lead operator shift是滞后/超前运算符
  • setcolorder rearranges columns by reference setcolorder通过引用重新排列列

(Note that my result doesn't look like the OP's, either because the wrong dput was given or because POSIX datetime objects are incredibly finicky. I recommend IDateTime from the data.table package instead.) (请注意,我的结果看起来并不像OP的,或者是因为错误的dput给予或因为POSIX datetime对象是令人难以置信挑剔的。我建议IDateTime从data.table包来替代。)

Let's assume your first dataframe is named x . 假设您的第一个数据框名为x Then do: data.frame(value=names(tapply(x$datetime, x$value, min)), start=tapply(x$datetime, x$value, max), stop=tapply(x$datetime, x$value, max)) 然后做: data.frame(value=names(tapply(x$datetime, x$value, min)), start=tapply(x$datetime, x$value, max), stop=tapply(x$datetime, x$value, max))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM