简体   繁体   English

在 R 中查找缺失的日期范围

[英]Find missing date range in R

I have a dataframe of start and end dates, where each row represents a specific trip.我有一个开始日期和结束日期的数据框,其中每一行代表一次特定的旅行。

Those date ranges makeup a continuous timeline except around April where there is a discontinuity /lack of data (because no trips were taken).这些日期范围构成了一个连续的时间线,除了 4 月左右存在不连续/缺乏数据(因为没有进行任何旅行)。

I would like to find the start and end date of that specific period?我想找到那个特定时期的开始和结束日期? (using a tidy approach preferably) (最好使用整洁的方法)

library(tidyverse)

df<- data.frame(start = as.Date(c("2022-01-03", "2022-01-18", "2022-01-31", "2022-03-01" ,"2022-03-08", "2022-03-09", "2022-04-15",
                     "2022-04-20", "2022-04-20","2022-05-03", "2022-05-17", "2022-05-17", "2022-05-31", "2022-06-05", "2022-06-22" ,"2022-06-28", "2022-07-11")), 
           end =  as.Date(c("2022-01-18","2022-01-31", "2022-03-01" ,"2022-03-08" ,"2022-03-09", "2022-03-25", "2022-04-20" ,"2022-04-20", "2022-05-03",
                    "2022-05-17" ,"2022-05-17", "2022-05-31", "2022-06-05" ,"2022-06-22" ,"2022-06-28" ,"2022-07-11", "2022-07-17"))) %>% 
  mutate(trip_number = as.character(row_number()))

df %>% 
  ggplot()+
  geom_segment(aes(x = start, xend = end, y =0, yend= 0, col = trip_number))+
  theme(legend.position = "none")

Created on 2022-07-17 by the reprex package (v2.0.1)reprex 包于 2022-07-17 创建 (v2.0.1)

A possible solution:一个可能的解决方案:

library(tidyverse)
library(lubridate)


df %>% 
  mutate(date1 = if_else(start == lag(end), NA_Date_, lag(end)),
         date2 = if_else(start == lag(end), NA_Date_, start)) %>% 
  bind_rows(tibble(start = .$date1, end = .$date2)) %>%
  filter(!if_all(everything(), is.na)) %>% 
  arrange(start) %>% 
  select(!starts_with("date"))

#>         start        end trip_number
#> 1  2022-01-03 2022-01-18           1
#> 2  2022-01-18 2022-01-31           2
#> 3  2022-01-31 2022-03-01           3
#> 4  2022-03-01 2022-03-08           4
#> 5  2022-03-08 2022-03-09           5
#> 6  2022-03-09 2022-03-25           6
#> 7  2022-03-25 2022-04-15        <NA>
#> 8  2022-04-15 2022-04-20           7
#> 9  2022-04-20 2022-04-20           8
#> 10 2022-04-20 2022-05-03           9
#> 11 2022-05-03 2022-05-17          10
#> 12 2022-05-17 2022-05-17          11
#> 13 2022-05-17 2022-05-31          12
#> 14 2022-05-31 2022-06-05          13
#> 15 2022-06-05 2022-06-22          14
#> 16 2022-06-22 2022-06-28          15
#> 17 2022-06-28 2022-07-11          16
#> 18 2022-07-11 2022-07-17          17

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM