I have a data frame with datetimes and values, like so:
datetime value
1 2016-05-03 08:51:41 0
2 2016-05-03 10:36:24 0
3 2016-05-03 10:36:32 9
4 2016-05-03 10:45:01 5
5 2016-05-03 10:45:24 0
6 2016-05-03 19:37:02 0
7 2016-05-03 19:37:06 7
8 2016-05-03 19:48:38 0
What I would like is a table that contains start and stop times for periods over which the value was constant. For the table above the expected output is the following:
value start stop
1 0 <NA> 2016-05-03 10:36:32
2 9 2016-05-03 10:36:32 2016-05-03 10:45:01
3 5 2016-05-03 10:45:01 2016-05-03 10:45:24
4 0 2016-05-03 10:45:24 2016-05-03 19:37:06
5 7 2016-05-03 19:37:06 2016-05-03 19:48:38
6 0 2016-05-03 19:48:38 <NA>
dput of the original table
structure(list(datetime = structure(c(1462258301, 1462264584,
1462264592, 1462265101, 1462265124, 1462297022, 1462297026, 1462297718
), class = c("POSIXct", "POSIXt"), tzone = ""), value = c(0,
0, 9, 5, 0, 0, 7, 0)), class = "data.frame", row.names = c(NA,
-8L), .Names = c("datetime", "value"))
Using data.table...
library(data.table)
setDT(DF)
res = DF[, .(end = datetime[.N]), by=.(value, seq = rleid(value))]
res[.N, end := NA]
value seq end
1: 0 1 2016-05-03 04:36:24
2: 9 2 2016-05-03 04:36:32
3: 5 3 2016-05-03 04:45:01
4: 0 4 2016-05-03 13:37:02
5: 7 5 2016-05-03 13:37:06
6: 0 6 <NA>
I would stop at this point, since it is redundant to add the start
column. If you really want it:
res[, start := shift(end)]
setcolorder(res, c("value", "seq", "start", "end"))
value seq start end
1: 0 1 <NA> 2016-05-03 04:36:24
2: 9 2 2016-05-03 04:36:24 2016-05-03 04:36:32
3: 5 3 2016-05-03 04:36:32 2016-05-03 04:45:01
4: 0 4 2016-05-03 04:45:01 2016-05-03 13:37:02
5: 7 5 2016-05-03 13:37:02 2016-05-03 13:37:06
6: 0 6 2016-05-03 13:37:06 <NA>
How it works:
DT[i, j, by]
filters to i
and then computes j
in each subset determined in by
.()
is just a shortcut to list()
rleid
identifies each "run" of identical values .N
is the number of rows in a by
group (or the number of rows in a table if by
is blank) :=
modifies columns by reference shift
is a lag/lead operator setcolorder
rearranges columns by reference (Note that my result doesn't look like the OP's, either because the wrong dput
was given or because POSIX datetime objects are incredibly finicky. I recommend IDateTime
from the data.table package instead.)
Let's assume your first dataframe is named x
. Then do: data.frame(value=names(tapply(x$datetime, x$value, min)), start=tapply(x$datetime, x$value, max), stop=tapply(x$datetime, x$value, max))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.