Time difference between two rows with NA

Question

I have a dataframe similar to the following (although with 80000 rows) where first column is "Date.Time" and the rest of columns are variables that have some values with NA. As an reprex example:

df <- data.frame(
Date= c("2020-01-01 09:50:00", "2020-01-01 09:51:30", "2020-01-01 09:53:00", "2020-01-01 09:54:00",
"2020-01-01 09:55:00", "2020-01-01 09:57:30", "2020-01-01 09:59:00", "2020-01-01 10:01:00"),
Variable1 = c(10,15,NA,25,22,10,11,NA),
Variable2 = c(1,NA,2,5,8,6,8,NA))

What I need is the maximum time interval between 2 rows without NA. On the previous example, the values I would need are for Variable1 and Date[7,1]-Date[4,1] (since Date[2,1]-Date[1,1] is a time interval smaller), while for Variable2 it would be Date[7,1]-Date [3,1]

I've been trying with rle() function, obtaining for each variable the intervals of NA and not NA:

is.na.rle222 <- rle(is.na(df[, "Variable1"]))

But I only obtain the size of the biggest interval without a link to dates.

Hope my question is clear.

Thanks in advance

Answer 1

You can split Date and get the maximum difference using range and diff per group like:

i <- cumsum(c(1, abs(diff(is.na(df$Variable1)))))
x <- lapply(split(as.POSIXct(df$Date), i), function(x) diff(range(x)))
x[[which.max(x)]]
#Time difference of 5 mins

Answer 2

Using the logic from @GKi with dplyr and trying to be more explicit:

require(dplyr)
(
  df
  %>% mutate(Var1_interval_grp = cumsum(c(1, abs(diff(is.na(df$Variable1))))),
             Var2_interval_grp = cumsum(c(1, abs(diff(is.na(df$Variable2))))))
  %>% group_by(Var1_interval_grp)
  %>% mutate(Range_Var1 = diff(range(as.POSIXct(Date))))
  %>% ungroup
  %>% group_by(Var2_interval_grp)
  %>% mutate(Range_Var2 = diff(range(as.POSIXct(Date))))
  %>% ungroup
  %>% select(! contains("grp"))
) -> df

The output is now:

> df 
# A tibble: 8 x 5
  Date                Variable1 Variable2 Range_Var1 Range_Var2
  <chr>                   <dbl>     <dbl> <drtn>     <drtn>    
1 2020-01-01 09:50:00        10         1  90 secs     0 secs  
2 2020-01-01 09:51:30        15        NA  90 secs     0 secs  
3 2020-01-01 09:53:00        NA         2   0 secs   360 secs  
4 2020-01-01 09:54:00        25         5 300 secs   360 secs  
5 2020-01-01 09:55:00        22         8 300 secs   360 secs  
6 2020-01-01 09:57:30        10         6 300 secs   360 secs  
7 2020-01-01 09:59:00        11         8 300 secs   360 secs  
8 2020-01-01 10:01:00        NA        NA   0 secs     0 secs

And it is easy to get the dates where max is reached:

(
  df
  %>% filter(Range_Var1 == max(Range_Var1))
  %>% pull(Date)
)

which produces:

[1] "2020-01-01 09:54:00" "2020-01-01 09:55:00" "2020-01-01 09:57:30"
[4] "2020-01-01 09:59:00"

Time difference between two rows with NA

Question

2 answers

solution1
2 2020-08-25 14:22:55

solution2
2 ACCPTED 2020-08-25 14:45:26

Time difference between two rows with NA

Question

2 answers

solution1 2 2020-08-25 14:22:55

solution2 2 ACCPTED 2020-08-25 14:45:26

solution1
2 2020-08-25 14:22:55

solution2
2 ACCPTED 2020-08-25 14:45:26