简体   繁体   中英

Time difference between two rows with NA

I have a dataframe similar to the following (although with 80000 rows) where first column is "Date.Time" and the rest of columns are variables that have some values with NA. As an reprex example:

df <- data.frame(
Date= c("2020-01-01 09:50:00", "2020-01-01 09:51:30", "2020-01-01 09:53:00", "2020-01-01 09:54:00",
"2020-01-01 09:55:00", "2020-01-01 09:57:30", "2020-01-01 09:59:00", "2020-01-01 10:01:00"),
Variable1 = c(10,15,NA,25,22,10,11,NA),
Variable2 = c(1,NA,2,5,8,6,8,NA))

What I need is the maximum time interval between 2 rows without NA. On the previous example, the values I would need are for Variable1 and Date[7,1]-Date[4,1] (since Date[2,1]-Date[1,1] is a time interval smaller), while for Variable2 it would be Date[7,1]-Date [3,1]

I've been trying with rle() function, obtaining for each variable the intervals of NA and not NA:

is.na.rle222 <- rle(is.na(df[, "Variable1"]))

But I only obtain the size of the biggest interval without a link to dates.

Hope my question is clear.

Thanks in advance

You can split Date and get the maximum difference using range and diff per group like:

i <- cumsum(c(1, abs(diff(is.na(df$Variable1)))))
x <- lapply(split(as.POSIXct(df$Date), i), function(x) diff(range(x)))
x[[which.max(x)]]
#Time difference of 5 mins

Using the logic from @GKi with dplyr and trying to be more explicit:

require(dplyr)
(
  df
  %>% mutate(Var1_interval_grp = cumsum(c(1, abs(diff(is.na(df$Variable1))))),
             Var2_interval_grp = cumsum(c(1, abs(diff(is.na(df$Variable2))))))
  %>% group_by(Var1_interval_grp)
  %>% mutate(Range_Var1 = diff(range(as.POSIXct(Date))))
  %>% ungroup
  %>% group_by(Var2_interval_grp)
  %>% mutate(Range_Var2 = diff(range(as.POSIXct(Date))))
  %>% ungroup
  %>% select(! contains("grp"))
) -> df

The output is now:

> df 
# A tibble: 8 x 5
  Date                Variable1 Variable2 Range_Var1 Range_Var2
  <chr>                   <dbl>     <dbl> <drtn>     <drtn>    
1 2020-01-01 09:50:00        10         1  90 secs     0 secs  
2 2020-01-01 09:51:30        15        NA  90 secs     0 secs  
3 2020-01-01 09:53:00        NA         2   0 secs   360 secs  
4 2020-01-01 09:54:00        25         5 300 secs   360 secs  
5 2020-01-01 09:55:00        22         8 300 secs   360 secs  
6 2020-01-01 09:57:30        10         6 300 secs   360 secs  
7 2020-01-01 09:59:00        11         8 300 secs   360 secs  
8 2020-01-01 10:01:00        NA        NA   0 secs     0 secs  

And it is easy to get the dates where max is reached:

(
  df
  %>% filter(Range_Var1 == max(Range_Var1))
  %>% pull(Date)
) 

which produces:

[1] "2020-01-01 09:54:00" "2020-01-01 09:55:00" "2020-01-01 09:57:30"
[4] "2020-01-01 09:59:00"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM