Event data to start-stop

Question

I have a data frame with datetimes and values, like so:

             datetime value
1 2016-05-03 08:51:41     0
2 2016-05-03 10:36:24     0
3 2016-05-03 10:36:32     9
4 2016-05-03 10:45:01     5
5 2016-05-03 10:45:24     0
6 2016-05-03 19:37:02     0
7 2016-05-03 19:37:06     7
8 2016-05-03 19:48:38     0

What I would like is a table that contains start and stop times for periods over which the value was constant. For the table above the expected output is the following:

  value               start                stop
1     0                <NA> 2016-05-03 10:36:32
2     9 2016-05-03 10:36:32 2016-05-03 10:45:01
3     5 2016-05-03 10:45:01 2016-05-03 10:45:24
4     0 2016-05-03 10:45:24 2016-05-03 19:37:06
5     7 2016-05-03 19:37:06 2016-05-03 19:48:38
6     0 2016-05-03 19:48:38                <NA>

dput of the original table

structure(list(datetime = structure(c(1462258301, 1462264584, 
1462264592, 1462265101, 1462265124, 1462297022, 1462297026, 1462297718
), class = c("POSIXct", "POSIXt"), tzone = ""), value = c(0, 
0, 9, 5, 0, 0, 7, 0)), class = "data.frame", row.names = c(NA, 
-8L), .Names = c("datetime", "value"))

Answer 1

Using data.table...

library(data.table)
setDT(DF)

res = DF[, .(end = datetime[.N]), by=.(value, seq = rleid(value))]
res[.N, end := NA]

   value seq                 end
1:     0   1 2016-05-03 04:36:24
2:     9   2 2016-05-03 04:36:32
3:     5   3 2016-05-03 04:45:01
4:     0   4 2016-05-03 13:37:02
5:     7   5 2016-05-03 13:37:06
6:     0   6                <NA>

I would stop at this point, since it is redundant to add the start column. If you really want it:

res[, start := shift(end)]
setcolorder(res, c("value", "seq", "start", "end"))


   value seq               start                 end
1:     0   1                <NA> 2016-05-03 04:36:24
2:     9   2 2016-05-03 04:36:24 2016-05-03 04:36:32
3:     5   3 2016-05-03 04:36:32 2016-05-03 04:45:01
4:     0   4 2016-05-03 04:45:01 2016-05-03 13:37:02
5:     7   5 2016-05-03 13:37:02 2016-05-03 13:37:06
6:     0   6 2016-05-03 13:37:06                <NA>

How it works:

DT[i, j, by] filters to i and then computes j in each subset determined in by
.() is just a shortcut to list()
rleid identifies each "run" of identical values
.N is the number of rows in a by group (or the number of rows in a table if by is blank)
:= modifies columns by reference
shift is a lag/lead operator
setcolorder rearranges columns by reference

(Note that my result doesn't look like the OP's, either because the wrong dput was given or because POSIX datetime objects are incredibly finicky. I recommend IDateTime from the data.table package instead.)

Answer 2

Let's assume your first dataframe is named x . Then do: data.frame(value=names(tapply(x$datetime, x$value, min)), start=tapply(x$datetime, x$value, max), stop=tapply(x$datetime, x$value, max))

Event data to start-stop

Question

2 answers

solution1
5 ACCPTED 2016-05-04 15:49:47

solution2
0 2016-05-04 15:39:56

Event data to start-stop

Question

2 answers

solution1 5 ACCPTED 2016-05-04 15:49:47

solution2 0 2016-05-04 15:39:56

solution1
5 ACCPTED 2016-05-04 15:49:47

solution2
0 2016-05-04 15:39:56