[英]How to add column based on specific row differences?
我有一个data.table,格式如下:
DT <- data.table(
id=c("123", "123", "125", "125", "123", "123", "123"),
action=c("started", "finished", "started", "finished", "started", "started", "finished"), time=c(as.POSIXct("2014-02-19 03:24:00"), as.POSIXct("2014-02-19 03:29:00"), as.POSIXct("2014-02-19 03:30:00"), as.POSIXct("2014-02-19 03:34:00"), as.POSIXct("2014-02-19 08:24:00"), as.POSIXct("2014-02-19 09:45:00"), as.POSIXct("2014-02-19 10:33:00")))
id action time 1 123 started 2014-02-19 03:24:00 2 123 finished 2014-02-19 03:29:00 3 125 started 2014-02-19 03:30:00 4 125 finished 2014-02-19 03:34:00 5 123 started 2014-02-19 08:24:00 6 123 started 2014-02-19 09:45:00 7 123 finished 2014-02-19 10:33:00
我想添加一个列,显示每个id行之间的时差(action:“finished” - “started”)。 该表按时间排序,但可能缺少数据(例如,可能会发生“完成” - 操作丢失,因为第5行和第6行就是这种情况。在这种情况下,应忽略第5行并且计算6和7之间的差值。最终表格应如下所示。
id action time durationInMinutes 1 123 started 2014-02-19 03:24:00 NA 2 123 finished 2014-02-19 03:29:00 5 3 125 started 2014-02-19 03:30:00 NA 4 125 finished 2014-02-19 03:34:00 4 5 123 started 2014-02-19 08:24:00 NA 6 123 started 2014-02-19 09:45:00 NA 7 123 finished 2014-02-19 10:33:00 48
有没有data.table解决方案?
DT[, duration := as.integer(time[action == "finished"] -
tail(time[action == "started"], 1))
, by = cumsum(c(0, tail(lag(id) != id, -1)))][
action == "started", duration := NA]
DT
# id action time duration
#1: 123 started 2014-02-19 03:24:00 NA
#2: 123 finished 2014-02-19 03:29:00 5
#3: 125 started 2014-02-19 03:30:00 NA
#4: 125 finished 2014-02-19 03:34:00 4
#5: 123 started 2014-02-19 08:24:00 NA
#6: 123 started 2014-02-19 09:45:00 NA
#7: 123 finished 2014-02-19 10:33:00 48
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.