简体   繁体   English

如何在 R 中拆分时间序列数据?

[英]How to split time series data with breaks in R?

I have a overall start date and end date with break dates, and I am hoping to create multiple entries of times series data showing actual dates worked, which means I'll use the start and finish dates at the beginning and end of the series grouped by ID, and use the break dates in the middle... is there a simpler way of doing this other than using a loop?我有一个带有休息日期的总体开始日期和结束日期,我希望创建多个时间序列数据条目,显示实际工作日期,这意味着我将在分组的开始和结束时使用开始和结束日期按ID,并使用中间的休息日期......除了使用循环之外,还有更简单的方法吗?

Data I have:我拥有的数据:

ID     Start       Finish      Break_start     Break_Finish     Break_Number
a      01-01-20    03-05-20    29-04-20        01-05-20         1
b      20-09-19    01-04-22    12-11-19        05-12-19         1
b      20-09-19    01-04-22    05-08-20        25-08-20         2

Data wanted想要的数据

ID    Start_new       Finish_new
a     01-01-20        28-04-20
a     01-05-20        03-05-20
b     20-09-19        11-11-19
b     05-12-19        04-08-20
b     25-08-20        01-04-22

Thank you!谢谢!

With dplyr , you could summarise the data by ID to get the starting and finish dates of each duration.使用dplyr ,您可以按ID汇总数据以获取每个持续时间的开始和结束日期。

library(dplyr)

df %>%
  mutate(across(2:5, as.Date, "%d-%m-%y")) %>%
  group_by(ID) %>%
  summarise(Start_new = c(first(Start), Break_Finish),
            Finish_new = c(Break_start - 1, first(Finish))) %>%
  ungroup()

# # A tibble: 5 × 3
#   ID    Start_new  Finish_new
#   <chr> <date>     <date>    
# 1 a     2020-01-01 2020-04-28
# 2 a     2020-05-01 2020-05-03
# 3 b     2019-09-20 2019-11-11
# 4 b     2019-12-05 2020-08-04
# 5 b     2020-08-25 2022-04-01

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM