如何为每个组（学生合同）计算日期时间之间的时差？

Question

I have a specific problem; 我有一个特定的问题； I have data in the following format: 我有以下格式的数据：

#   USER_ID SUBMISSION_DATE CONTRACT_REF
1        1       20/6 1:00         W001
2        1       20/6 2:00         W002
3        1       20/6 3:30         W003
4        4       20/6 4:00         W004
5        5       20/6 5:00         W005
6        5       20/6 6:00         W006
7        7       20/6 7:00         W007
8        7       20/6 8:00         W008
9        7       20/6 9:00         W009
10       7      20/6 10:00        W0010

Now I need to somehow calculate the time difference between the different submissions (uniquely identifiable). 现在，我需要以某种方式计算不同提交之间的时间差（可唯一识别）。

In other words: I have a table of submissions , in this table, there are all submissions for all users. 换句话说：我有一个提交表 ，在此表中，所有用户的所有提交。 I need to find a way how to calculate the time difference for each unique STUDENT-CONTRACT tuple between nth assignment and the (n-1)th assignment . 我需要找到一种方法来计算第n个分配和第（n-1）个分配之间的每个唯一STUDENT-CONTRACT元组的时间差。

Also note that each new user has to has zero for the new assignment. 还要注意，每个新用户必须为新分配分配零。 So the output would look as follows: 因此输出将如下所示：

#   USER_ID SUBMISSION_DATE CONTRACT_REF  TIME_DIFFRENCE
1        1       20/6 1:00         W001                0
2        1       20/6 2:00         W002             3600
3        1       20/6 3:30         W003             5400
4        4       20/6 4:00         W004             3600
5        5       20/6 5:00         W005                0          
6        5       20/6 6:00         W006             3600
7        7       20/6 7:00         W007                0
8        7       20/6 8:00         W008             3600
9        7       20/6 9:00         W009             3600
10       7      20/6 10:00        W0010             3600

Note that the time may NOT be in seconds, but whatever is suitable. 请注意，时间可能不以秒为单位，但是合适的时间。

My thoughts: 1) I presume this will require as.POSIXct somewhere so that R knows how to deal with the time 2) This may involve some package such as plyr , but I am so utterly lost in the documentation and examples are hard to find. 我的想法：1）我认为这将在某处需要as.POSIXct，以便R知道如何处理时间2）这可能涉及一些程序包，例如plyr ，但是我完全迷失在文档中，并且很难找到示例。

Thank you very much for all responses! 非常感谢您的所有回复！

Best, Jakub 最好，雅各布

Answer 1

Here's an attempt. 这是一个尝试。 Firstly, get the data: 首先，获取数据：

dat <- read.csv(text="USER_ID,SUBMISSION_DATE,CONTRACT_REF
1,20/6 1:00,W001
1,20/6 2:00,W002
1,20/6 3:30,W003
4,20/6 4:00,W004
5,20/6 5:00,W005
5,20/6 6:00,W006
7,20/6 7:00,W007
7,20/6 8:00,W008
7,20/6 9:00,W009
7,20/6 10:00,W0010",header=TRUE)

Get the number from the contract ref and sort the data 从合同参考中获取编号并对数据进行排序

dat$CR_NUM <- as.numeric(gsub("W","",dat$CONTRACT_REF))
dat <- with(dat,dat[order(USER_ID,CR_NUM),])

Convert the date to a POSIXct numeric representation 将日期转换为POSIXct数字表示形式

dat$SD_DATE <- as.numeric(with(dat,as.POSIXct(SUBMISSION_DATE,format="%d/%m %H:%M")))

Calculate a time difference with a 0 at the start using ave 使用ave开始计算时差为0

dat$TIME_DIFF <- with(dat, ave(SD_DATE, USER_ID, FUN=function(x) c(0,diff(x)) ))

Result: 结果：

# not showing the calculated columns
dat[-c(4:5)]

   USER_ID SUBMISSION_DATE CONTRACT_REF TIME_DIFF
1        1       20/6 1:00         W001         0
2        1       20/6 2:00         W002      3600
3        1       20/6 3:30         W003      5400
4        4       20/6 4:00         W004         0
5        5       20/6 5:00         W005         0
6        5       20/6 6:00         W006      3600
7        7       20/6 7:00         W007         0
8        7       20/6 8:00         W008      3600
9        7       20/6 9:00         W009      3600
10       7      20/6 10:00        W0010      3600

Answer 2

Here's a slightly tighter version (with fewer "intermediate" columns). 这是一个稍微严格的版本（“中间”列较少）。 Note that using "difftime" rather than "diff" allows you to choose your time units (seconds, minutes, hours, etc.) 请注意，使用“ difftime”而不是“ diff”可以选择时间单位（秒，分钟，小时等）。

dat$DATE2 <- as.POSIXct(dat$SUBMISSION_DATE,format="%d/%m %H:%M")
getDtimes <- function(t) {
  if(length(t)>0)   c(0,difftime(t[-1], t[-length(t)], units="hours")) else(0)
}
dat$DTime <- unlist(with(dat, tapply(DATE2, USER_ID, getDtimes)))

The key (as above) is to convert times to POSIXt objects. 关键（如上所述）是将时间转换为POSIXt对象。 tapply generates a list of the time difference vectors, which you then need to unlist . tapply生成一个时差矢量列表，然后您需要unlist 。

如何为每个组（学生合同）计算日期时间之间的时差？

问题描述

2 个解决方案

解决方案1
2 已采纳 2013-08-06 23:33:56

解决方案2
1 2013-08-07 04:01:16

如何为每个组（学生合同）计算日期时间之间的时差？

问题描述

2 个解决方案

解决方案1 2 已采纳 2013-08-06 23:33:56

解决方案2 1 2013-08-07 04:01:16

解决方案1
2 已采纳 2013-08-06 23:33:56

解决方案2
1 2013-08-07 04:01:16