[英]How to split time series data with breaks in R?
I have a overall start date and end date with break dates, and I am hoping to create multiple entries of times series data showing actual dates worked, which means I'll use the start and finish dates at the beginning and end of the series grouped by ID, and use the break dates in the middle... is there a simpler way of doing this other than using a loop?我有一个带有休息日期的总体开始日期和结束日期,我希望创建多个时间序列数据条目,显示实际工作日期,这意味着我将在分组的开始和结束时使用开始和结束日期按ID,并使用中间的休息日期......除了使用循环之外,还有更简单的方法吗?
Data I have:我拥有的数据:
ID Start Finish Break_start Break_Finish Break_Number
a 01-01-20 03-05-20 29-04-20 01-05-20 1
b 20-09-19 01-04-22 12-11-19 05-12-19 1
b 20-09-19 01-04-22 05-08-20 25-08-20 2
Data wanted想要的数据
ID Start_new Finish_new
a 01-01-20 28-04-20
a 01-05-20 03-05-20
b 20-09-19 11-11-19
b 05-12-19 04-08-20
b 25-08-20 01-04-22
Thank you!谢谢!
With dplyr
, you could summarise the data by ID
to get the starting and finish dates of each duration.使用
dplyr
,您可以按ID
汇总数据以获取每个持续时间的开始和结束日期。
library(dplyr)
df %>%
mutate(across(2:5, as.Date, "%d-%m-%y")) %>%
group_by(ID) %>%
summarise(Start_new = c(first(Start), Break_Finish),
Finish_new = c(Break_start - 1, first(Finish))) %>%
ungroup()
# # A tibble: 5 × 3
# ID Start_new Finish_new
# <chr> <date> <date>
# 1 a 2020-01-01 2020-04-28
# 2 a 2020-05-01 2020-05-03
# 3 b 2019-09-20 2019-11-11
# 4 b 2019-12-05 2020-08-04
# 5 b 2020-08-25 2022-04-01
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.