[英]Event data to start-stop
I have a data frame with datetimes and values, like so: 我有一个包含日期时间和值的数据框,如下所示:
datetime value
1 2016-05-03 08:51:41 0
2 2016-05-03 10:36:24 0
3 2016-05-03 10:36:32 9
4 2016-05-03 10:45:01 5
5 2016-05-03 10:45:24 0
6 2016-05-03 19:37:02 0
7 2016-05-03 19:37:06 7
8 2016-05-03 19:48:38 0
What I would like is a table that contains start and stop times for periods over which the value was constant. 我想要的是一个表,其中包含值恒定的时间段的开始和结束时间。 For the table above the expected output is the following:
对于上面的表,预期输出如下:
value start stop
1 0 <NA> 2016-05-03 10:36:32
2 9 2016-05-03 10:36:32 2016-05-03 10:45:01
3 5 2016-05-03 10:45:01 2016-05-03 10:45:24
4 0 2016-05-03 10:45:24 2016-05-03 19:37:06
5 7 2016-05-03 19:37:06 2016-05-03 19:48:38
6 0 2016-05-03 19:48:38 <NA>
dput of the original table 原始表的输出
structure(list(datetime = structure(c(1462258301, 1462264584,
1462264592, 1462265101, 1462265124, 1462297022, 1462297026, 1462297718
), class = c("POSIXct", "POSIXt"), tzone = ""), value = c(0,
0, 9, 5, 0, 0, 7, 0)), class = "data.frame", row.names = c(NA,
-8L), .Names = c("datetime", "value"))
Using data.table... 使用data.table ...
library(data.table)
setDT(DF)
res = DF[, .(end = datetime[.N]), by=.(value, seq = rleid(value))]
res[.N, end := NA]
value seq end
1: 0 1 2016-05-03 04:36:24
2: 9 2 2016-05-03 04:36:32
3: 5 3 2016-05-03 04:45:01
4: 0 4 2016-05-03 13:37:02
5: 7 5 2016-05-03 13:37:06
6: 0 6 <NA>
I would stop at this point, since it is redundant to add the start
column. 我将在此处停止,因为添加
start
列是多余的。 If you really want it: 如果您真的想要它:
res[, start := shift(end)]
setcolorder(res, c("value", "seq", "start", "end"))
value seq start end
1: 0 1 <NA> 2016-05-03 04:36:24
2: 9 2 2016-05-03 04:36:24 2016-05-03 04:36:32
3: 5 3 2016-05-03 04:36:32 2016-05-03 04:45:01
4: 0 4 2016-05-03 04:45:01 2016-05-03 13:37:02
5: 7 5 2016-05-03 13:37:02 2016-05-03 13:37:06
6: 0 6 2016-05-03 13:37:06 <NA>
How it works: 这个怎么运作:
DT[i, j, by]
filters to i
and then computes j
in each subset determined in by
DT[i, j, by]
过滤到i
,然后在by
确定的每个子集中计算j
.()
is just a shortcut to list()
.()
只是list()
的快捷方式 rleid
identifies each "run" of identical values rleid
标识每个“运行”的相同值 .N
is the number of rows in a by
group (or the number of rows in a table if by
is blank) .N
是“ by
组”中的行数(如果“ by
为空by
则为表中的行数) :=
modifies columns by reference :=
通过引用修改列 shift
is a lag/lead operator shift
是滞后/超前运算符 setcolorder
rearranges columns by reference setcolorder
通过引用重新排列列 (Note that my result doesn't look like the OP's, either because the wrong dput
was given or because POSIX datetime objects are incredibly finicky. I recommend IDateTime
from the data.table package instead.) (请注意,我的结果看起来并不像OP的,或者是因为错误的
dput
给予或因为POSIX datetime对象是令人难以置信挑剔的。我建议IDateTime
从data.table包来替代。)
Let's assume your first dataframe is named x
. 假设您的第一个数据框名为
x
。 Then do: data.frame(value=names(tapply(x$datetime, x$value, min)), start=tapply(x$datetime, x$value, max), stop=tapply(x$datetime, x$value, max))
然后做:
data.frame(value=names(tapply(x$datetime, x$value, min)), start=tapply(x$datetime, x$value, max), stop=tapply(x$datetime, x$value, max))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.