简体   繁体   中英

how to calculate time difference between dates by group

I have a dataframe containing date.times and locations. I would like to calculate the difference in minutes between a record and the previous record (arranged according to date) within groups and mutate to a new column.

I have worked out how to do it using a loop, but this only does it for all the groups (locations) together, and I am unsure how i would do this by group instead.

# fake data set for example:
df <- data.frame(
  location = c(
    1,1,3,4,4,5,6,5,4,4,3,2,2,1,1,2,3,4,4,2
  ),
  date.time = c(
    "2017-10-22 04:49:23", "2017-10-23 01:02:06",
    "2017-10-23 01:09:17", "2017-10-23 18:32:46",
    "2017-10-24 18:50:19", "2017-11-01 03:07:24",
    "2017-11-01 19:05:58", "2017-11-02 01:56:48",
    "2017-11-02 01:58:16", "2017-11-02 02:00:38",
    "2017-11-06 19:53:56", "2017-11-09 13:08:39",
    "2017-09-18 01:25:27", "2017-09-19 05:19:43",
    "2017-09-21 21:42:33", "2017-09-22 00:49:16",
    "2017-09-22 03:48:05", "2017-09-22 20:56:57",
    "2017-09-23 19:09:48", "2017-09-24 05:52:35"
  ),
  time.diff.mins = NA
) %>% 
  arrange(date.time) %>% 
  mutate(
    date.time = as.POSIXct(
      date.time, 
      format = "%Y-%m-%d %H:%M:%S"
    )
  )

This gives:

   location           date.time time.diff.mins
1         2 2017-09-18 01:25:27             NA
2         1 2017-09-19 05:19:43             NA
3         1 2017-09-21 21:42:33             NA
4         2 2017-09-22 00:49:16             NA
5         3 2017-09-22 03:48:05             NA
...
...

Thus, for example i would want the difference in minutes between row 4 and row 1 printed in time.diff.mins column in row 4. And time.diff.mins column, row 3, would have time diff between rows 3 and 2 printed in row 3. Then iteratively continue with calculations of time diff of the immediate previous record according to the location group.

This loop works for the entire data set, but i don't know how to integrate it with dplyr::group_by for example or some other method..

for (i in 2:nrow(df)) {
      df[i,3] <- 
        difftime(time1 = as.POSIXct(
          df[i, 2], 
          format = "%Y:%m:%d %H:%M:%S"
        ), 
        time2 = as.POSIXct(
          df[i-1, 2], 
          format = "%Y:%m:%d %H:%M:%S"
        ),
        units = "mins"
        )

    }

This produces for example:

   location           date.time time.diff.mins
1         2 2017-09-18 01:25:27             NA
2         1 2017-09-19 05:19:43    1674.266667
3         1 2017-09-21 21:42:33    3862.833333
4         2 2017-09-22 00:49:16     186.716667
5         3 2017-09-22 03:48:05     178.816667
...
...

Any advice or guidance would be greatly appreciated!

if we need to group by 'location'

library(dplyr)
df %>%
    group_by(location) %>%
    mutate(time.diff.mins = difftime(date.time, lag(date.time), unit = 'min'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM