I have a dataframe like this:
timestamp Status
05-01-2020 12:07:08 0
05-01-2020 12:36:05 1
05-01-2020 23:45:02 0
05-01-2020 13:44:33 1
06-01-2020 01:07:08 1
06-01-2020 10:23:05 1
06-01-2020 12:11:08 1
06-01-2020 22:06:12 1
07-01-2020 00:01:05 0
07-01-2020 02:17:09 1
07-01-2020 12:36:05 1
07-01-2020 12:07:08 1
07-01-2020 12:36:05 1
07-01-2020 12:36:05 0
08-01-2020 12:36:05 1
08-01-2020 12:36:05 0
08-01-2020 12:36:05 0
09-01-2020 12:36:05 1
09-01-2020 12:07:08 0
09-01-2020 12:36:05 1
11-01-2020 12:07:08 0
11-01-2020 12:36:05 1
I am trying to find the duration between each 1,0
pair. But i my data I can have status coming in any order. I may have 1 and 0 occurring 0ne by one..or I may have many 1s followed by a 0 etc.. I am trying to cut the duration into two if start (1) is on on day and end (0) is on next day provided they are continuous dates (like 1,2,3,4) and there is no 1s in between or there are any number of 1s between 1 and 0. First occurrence of 1 is like start...and first occurrence of 0 is like end.
I am able to calculate in the straight forward condition if 1 and 0 are on same date. Also if it is on two dates, I am able to calculate the difference between occurrence of 1 and 23:59:59 for first day and similarly from 00:00:00 till occurrence of second day.
Ex: let me have one set of data like this
07-01-2020 21:26:05 1
08-01-2020 02:33:45 0
These two fall on two different dates. So instead of finding the difference directly I want to cut it into two. So on first day ( 07-01-2020
) my duration will be from 21:26:05 to 23:59:59
and on second day it will be from 00:00:00 to 02:33:45
. This should repeat for any number of continuous dates.(like 7,8,9,10 etc)
But If have data like this:
07-01-2020 21:26:05 1
08-01-2020 02:33:45 1
09-01-2020 21:26:05 1
11-01-2020 02:33:45 1
I have to cut at (because after 9th its 11th so continuity is broken)
07-01-2020 21:26:05 to 07-01-2020 23:59:59
08-01-2020 00:00:00 to 08-01-2020 02:33:45
08-01-2020 02:33:45 to 08-01-2020 23:59:59
09-01-2020 00:00:00 to 09-01-2020 21:26:05
09-01-2020 21:26:05 to 09-01-2020 23:59:59
conditions like this:
07-01-2020 21:26:05 1
07-01-2020 22:33:45 1
07-01-2020 23:31:51 1
07-01-2020 23:48:33 0
07-01-2020 23:57:12 0
is same as:
07-01-2020 21:26:05 1
07-01-2020 23:48:33 0
And conditions like this:
07-01-2020 21:26:05 1
07-01-2020 22:33:45 1
07-01-2020 23:31:51 1
08-01-2020 03:48:33 0
08-01-2020 03:57:12 0
is same as:
07-01-2020 21:26:05 to 07-01-2020 23:59:59
07-01-2020 00:00:00 to 08-01-2020 03:48:33
I tried ifelse condition using in datatable and I was able to do the first split from x to 23:59:59 on the first day. But no other conditions are working.
df[, difference := ifelse((df$Status == 0 & shift(df$Status,type='lag') == 1) & (as.Date(df$timestamp) != shift(as.Date(df$timestamp),type = 'lag')),
as.numeric(df$timestamp - as.POSIXct(paste0(as.Date(timestamp)," ","00:00:00"),tz="UTC"),units='mins'),ifelse((df$Status == 1 & shift(df$Status,type='lead') == 0) & as.Date(df$timestamp) != shift(as.Date(df$timestamp),type = 'lead'),as.numeric(as.POSIXct(paste0(as.Date(timestamp)," ","23:59:59"),tz="UTC") - df$timestamp,units='mins'),
as.numeric(shift(df$timestamp,type = 'lead') - df$timestamp,units='mins')))]
library(tidyverse)
# Non-daily split:
df %>%
mutate(grp = cumsum(ifelse(ind == 0, 1, 0))) %>%
group_by(grp) %>%
filter(!(duplicated(ind))) %>%
ungroup() %>%
mutate(duration = difftime(timestamp, lag(timestamp), units = "hours"))
# Daily split:
df %>%
group_by(grp1 = as.Date(timestamp, "%Y-%m-%d")) %>%
filter(!duplicated(ind)) %>%
ungroup() %>%
mutate(grp = cumsum(ifelse(ind == 0, 1, 0))) %>%
group_by(grp, grp1) %>%
mutate(duration = difftime(timestamp, lag(timestamp), units = "hours")) %>%
ungroup()
Let
A = data.frame(timestamp = c(as.POSIXlt("2020-07-01 21:26:05"),
as.POSIXlt("2020-07-02 02:33:45"),
as.POSIXlt("2020-07-02 10:33:45"),
as.POSIXlt("2020-07-03 15:33:45"),
as.POSIXlt("2020-07-04 02:33:45")),
ind = as.numeric(c(0, 1, 1, 0, 1) ))
> A
timestamp ind
1 2020-07-01 21:26:05 0
2 2020-07-02 02:33:45 1
3 2020-07-02 10:33:45 1
4 2020-07-03 15:33:45 0
5 2020-07-04 02:33:45 1
be toy data for this example. Then the following code gives you the time distance between the first occurences of successive 0s and 1s.
A %>%
mutate(Diff = ind - lag(ind)) %>%
filter(is.na(Diff) | Diff != 0) %>%
mutate(Timedist = timestamp - lag(timestamp)) %>%
select(-Diff)
with output
timestamp ind Timedist
1 2020-07-01 21:26:05 0 NA hours
2 2020-07-02 02:33:45 1 5.1 hours
3 2020-07-03 15:33:45 0 37.0 hours
4 2020-07-04 02:33:45 1 11.0 hours
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.