Subtracting dates based on conditions using dplyr in r

Question

Below is an example of a table I am working with.

df = data.frame(Test_ID = c('a1','a1','a1','a1','a1','a1','a1','a2','a2','a2','a2','a2','a2'), 
            Event_ID = c('Failure_x', 'Failure_x', 'Failure_y', 'Failure_y', 'Failure_x',
                         'Failure_x', 'Failure_y', 'Failure_x', 'Failure_y', 'Failure_y',
                         'Failure_x','Failure_x', 'Failure_y'),
            Fail_Date = c('2018-10-10 17:52:20', '2018-10-11 17:02:16', '2018-10-14 12:52:20',
                          '2018-11-11 16:18:34', '2018-11-12 17:03:06', '2018-11-25 10:50:10',
                          '2018-12-01 10:28:50', '2018-09-12 19:02:08', '2018-09-20 11:32:25',
                          '2018-10-13 14:43:30', '2018-10-15 14:22:28', '2018-10-30 21:55:45',
                          '2018-11-17 11:53:35'))

I want to subtract the failure dates (by Test_ID) only where Failure_y occurs after Failure_x. The Fail_Date for Event_ID Failure_y will be subtracted from the Fail_Date for Event_ID Failure_x. Within a group I can have multiple Failure_y's. The second Failure_y will be subtracted from the Failure_x occurring after the first instance of Failure_y.

I have tried to use dplyr to create a column TIME_BETWEEN_FAILURES.

library(lubridate)
df$Fail_Date = as.POSIXct(as.character(as.factor(df$Fail_Date)),format="%Y-%m-%d %H:%M:%S")
df = df %>% group_by(Test_ID) %>% 
mutate(TIME_BETWEEN_FAILURES = ifelse(Event_ID == "Failure_y" & lag(Event_ID) == "Failure_x", 
                                    difftime(Fail_Date, first(Fail_Date),units = "hours"),''))`

I was able to create the Time_BETWEEN_FAILURES correctly only for the first instance using first() in dplyr. That's where I am currently stuck. Any help on this matter will be appreciated.

This is result from the code snippet above.

Output required for analysis.
This is ideal response needed for my analysis.

Thanks. Cheers.

Answer 1

df %>% 
  group_by(gr = rev(cumsum(rev(Event_ID)=="Failure_y")), Test_ID) %>%
  mutate(time_between_failures = ifelse(n() > 1 & Event_ID=="Failure_y", difftime(Fail_Date[n()], Fail_Date[1L], units = "hours"), NA)) 

# A tibble: 13 x 5
# Groups:   gr, Test_ID [6]
   Test_ID Event_ID  Fail_Date              gr time_between_failures
   <fct>   <fct>     <dttm>              <int>                 <dbl>
 1 a1      Failure_x 2018-10-10 17:52:20     6                   NA 
 2 a1      Failure_x 2018-10-11 17:02:16     6                   NA 
 3 a1      Failure_y 2018-10-14 12:52:20     6                   91 
 4 a1      Failure_y 2018-11-11 16:18:34     5                   NA 
 5 a1      Failure_x 2018-11-12 17:03:06     4                   NA 
 6 a1      Failure_x 2018-11-25 10:50:10     4                   NA 
 7 a1      Failure_y 2018-12-01 10:28:50     4                  449.
 8 a2      Failure_x 2018-09-12 19:02:08     3                   NA 
 9 a2      Failure_y 2018-09-20 11:32:25     3                  185.
10 a2      Failure_y 2018-10-13 14:43:30     2                   NA 
11 a2      Failure_x 2018-10-15 14:22:28     1                   NA 
12 a2      Failure_x 2018-10-30 21:55:45     1                   NA 
13 a2      Failure_y 2018-11-17 11:53:35     1                  790.

Subtracting dates based on conditions using dplyr in r

Question

1 answers

solution1
1 ACCPTED 2019-03-12 09:15:23

Subtracting dates based on conditions using dplyr in r

Question

1 answers

solution1 1 ACCPTED 2019-03-12 09:15:23

solution1
1 ACCPTED 2019-03-12 09:15:23