简体   繁体   English

如何按组计算日期之间的时间差

[英]how to calculate time difference between dates by group

I have a dataframe containing date.times and locations.我有一个 dataframe 包含 date.times 和位置。 I would like to calculate the difference in minutes between a record and the previous record (arranged according to date) within groups and mutate to a new column.我想计算组内记录与前一个记录(按日期排列)之间的分钟差,并变异为新列。

I have worked out how to do it using a loop, but this only does it for all the groups (locations) together, and I am unsure how i would do this by group instead.我已经想出了如何使用循环来执行此操作,但这仅适用于所有组(位置),我不确定如何按组执行此操作。

# fake data set for example:
df <- data.frame(
  location = c(
    1,1,3,4,4,5,6,5,4,4,3,2,2,1,1,2,3,4,4,2
  ),
  date.time = c(
    "2017-10-22 04:49:23", "2017-10-23 01:02:06",
    "2017-10-23 01:09:17", "2017-10-23 18:32:46",
    "2017-10-24 18:50:19", "2017-11-01 03:07:24",
    "2017-11-01 19:05:58", "2017-11-02 01:56:48",
    "2017-11-02 01:58:16", "2017-11-02 02:00:38",
    "2017-11-06 19:53:56", "2017-11-09 13:08:39",
    "2017-09-18 01:25:27", "2017-09-19 05:19:43",
    "2017-09-21 21:42:33", "2017-09-22 00:49:16",
    "2017-09-22 03:48:05", "2017-09-22 20:56:57",
    "2017-09-23 19:09:48", "2017-09-24 05:52:35"
  ),
  time.diff.mins = NA
) %>% 
  arrange(date.time) %>% 
  mutate(
    date.time = as.POSIXct(
      date.time, 
      format = "%Y-%m-%d %H:%M:%S"
    )
  )

This gives:这给出了:

   location           date.time time.diff.mins
1         2 2017-09-18 01:25:27             NA
2         1 2017-09-19 05:19:43             NA
3         1 2017-09-21 21:42:33             NA
4         2 2017-09-22 00:49:16             NA
5         3 2017-09-22 03:48:05             NA
...
...

Thus, for example i would want the difference in minutes between row 4 and row 1 printed in time.diff.mins column in row 4. And time.diff.mins column, row 3, would have time diff between rows 3 and 2 printed in row 3. Then iteratively continue with calculations of time diff of the immediate previous record according to the location group.因此,例如,我希望在第 4 行的 time.diff.mins 列中打印第 4 行和第 1 行之间的分钟差。第 3 行的 time.diff.mins 列将打印第 3 行和第 2 行之间的时间差异在第 3 行中。然后根据位置组迭代地继续计算前一个记录的时间差异。

This loop works for the entire data set, but i don't know how to integrate it with dplyr::group_by for example or some other method..这个循环适用于整个数据集,但我不知道如何将它与 dplyr::group_by 例如或其他一些方法集成..

for (i in 2:nrow(df)) {
      df[i,3] <- 
        difftime(time1 = as.POSIXct(
          df[i, 2], 
          format = "%Y:%m:%d %H:%M:%S"
        ), 
        time2 = as.POSIXct(
          df[i-1, 2], 
          format = "%Y:%m:%d %H:%M:%S"
        ),
        units = "mins"
        )

    }

This produces for example:这会产生例如:

   location           date.time time.diff.mins
1         2 2017-09-18 01:25:27             NA
2         1 2017-09-19 05:19:43    1674.266667
3         1 2017-09-21 21:42:33    3862.833333
4         2 2017-09-22 00:49:16     186.716667
5         3 2017-09-22 03:48:05     178.816667
...
...

Any advice or guidance would be greatly appreciated!任何建议或指导将不胜感激!

if we need to group by 'location'如果我们需要按“位置”分组

library(dplyr)
df %>%
    group_by(location) %>%
    mutate(time.diff.mins = difftime(date.time, lag(date.time), unit = 'min'))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM