简体   繁体   English

如何根据特定的行差异添加列?

[英]How to add column based on specific row differences?

I have a data.table in the following format: 我有一个data.table,格式如下:

DT <- data.table(
id=c("123", "123", "125", "125", "123", "123", "123"), 
action=c("started", "finished", "started", "finished", "started", "started", "finished"), time=c(as.POSIXct("2014-02-19 03:24:00"), as.POSIXct("2014-02-19 03:29:00"), as.POSIXct("2014-02-19 03:30:00"), as.POSIXct("2014-02-19 03:34:00"), as.POSIXct("2014-02-19 08:24:00"), as.POSIXct("2014-02-19 09:45:00"), as.POSIXct("2014-02-19 10:33:00")))
id  action      time
1   123 started     2014-02-19 03:24:00
2   123 finished    2014-02-19 03:29:00
3   125 started     2014-02-19 03:30:00
4   125 finished    2014-02-19 03:34:00
5   123 started     2014-02-19 08:24:00
6   123 started     2014-02-19 09:45:00
7   123 finished    2014-02-19 10:33:00

I would like to add a column that shows the time differences (action: "finished"-"started") between the rows per id. 我想添加一个列,显示每个id行之间的时差(action:“finished” - “started”)。 The table is sorted by time, but it is possible that there is missing data (eg it might happen that a "finished"-action is missing as it is the case in rows 5 and 6. In this case row 5 should be ignored and the difference between 6 and 7 is calculated. The final table should look like this. 该表按时间排序,但可能缺少数据(例如,可能会发生“完成” - 操作丢失,因为第5行和第6行就是这种情况。在这种情况下,应忽略第5行并且计算6和7之间的差值。最终表格应如下所示。

id  action      time                   durationInMinutes
1   123 started     2014-02-19 03:24:00    NA
2   123 finished    2014-02-19 03:29:00    5
3   125 started     2014-02-19 03:30:00    NA
4   125 finished    2014-02-19 03:34:00    4
5   123 started     2014-02-19 08:24:00    NA
6   123 started     2014-02-19 09:45:00    NA
7   123 finished    2014-02-19 10:33:00    48

Is there a data.table solution for that? 有没有data.table解决方案?

DT[, duration := as.integer(time[action == "finished"] -
                            tail(time[action == "started"], 1))
   , by = cumsum(c(0, tail(lag(id) != id, -1)))][
     action == "started", duration := NA]
DT
#    id   action                time duration
#1: 123  started 2014-02-19 03:24:00       NA
#2: 123 finished 2014-02-19 03:29:00        5
#3: 125  started 2014-02-19 03:30:00       NA
#4: 125 finished 2014-02-19 03:34:00        4
#5: 123  started 2014-02-19 08:24:00       NA
#6: 123  started 2014-02-19 09:45:00       NA
#7: 123 finished 2014-02-19 10:33:00       48

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据特定列条件添加行值? - How to add row values based on specific column conditions? 两组之间的绝对差异及其在 R 中每行的 95% 置信区间,并将其添加到特定列中的相应行 - absolute differences between 2 groups and their 95% confidence intervals in R for each row and add that to corresponding row in a specific column 如何添加一行,它是基于其他列中特定值的列中某些值的总和? - How to add a row, which is a sum of some values in a column based on specific values in other column? 如何为特定列中的每个值添加总计行,以根据其他列进行计算, - How would I add a Total Row for each value in a specific column, that does calculations based upon other columns, 如何根据r中的多个列和行值添加计算行? - How to add calculated row based on mulitple column and row values in r in? R-基于其他列和行的绝对差的累积和 - R - Cumulative sum of absolute differences based on other column and row 如何根据行索引替换数据框中的特定列值? - How to replace specific column values in data frame based on row index? 如何将值添加到特定矩阵行列 - How to add values to a specific matrix row-column 如何根据行号在 R 中添加“组”列 - How to add a “group” column in R based on the row number 如何根据行中字段的值添加具有值的列? - How add a column with a value based on the value of a field in a row?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM