简体   繁体   中英

Trying to determine if two ranges of dates overlap using R

I have a dataset that includes information about the schools that a student has attended within an academic year and their entry and withdrawal dates from each school. While most students only attend one school, there are others who have attended up to four different schools. I would like to make sure that none of the date ranges overlap. Below is an example of the data that I have (the dates are structured as dates):

|---------------------|------------------|---------------------|------------------|  
|    entry_date_1     | withdrawal_date_1|    entry_date_2     | withdrawal_date_2|  
|---------------------|------------------|---------------------|------------------|  
|     2017-11-09      |     2018-05-24   |          NA         |         NA       |  
|---------------------|------------------|---------------------|------------------|   
|     2017-08-14      |     2017-12-15   |    2017-12-16       |    2018-05-24    |  
|---------------------|------------------|---------------------|------------------|  
|     2017-08-14      |     2018-06-01   |    2018-01-16       |    2018-03-20    |        
|---------------------|------------------|---------------------|------------------|  
|     2018-01-24      |     2018-02-25   |    2018-04-03       |    2018-05-24    |  
|---------------------|------------------|---------------------|------------------|  

What I would ideally like is a column that gives me a logical operator like this:

|---------------------|------------------|---------------------|------------------|------------------|  
|    entry_date_1     | withdrawal_date_1|    entry_date_2     | withdrawal_date_2|     overlap?     |  
|---------------------|------------------|---------------------|------------------|------------------|  
|     2017-11-09      |     2018-05-24   |          NA         |         NA       |       NA         |  
|---------------------|------------------|---------------------|------------------|------------------|   
|     2017-08-14      |     2017-12-15   |    2017-12-16       |    2018-05-24    |       FALSE      |  
|---------------------|------------------|---------------------|------------------|------------------|  
|     2017-08-14      |     2018-06-01   |    2018-01-16       |    2018-03-20    |       TRUE       |        
|---------------------|------------------|---------------------|------------------|------------------|  
|     2018-01-24      |     2018-02-25   |    2018-04-03       |    2018-05-24    |       FALSE      |  
|---------------------|------------------|---------------------|------------------|------------------| 

I tried doing this using the %overlaps% function in the DescTools package, but it doesn't yield a logical operator for any column - just NA. If someone could help me to troubleshoot the issue, that would be great. And any other suggestions would also be helpful. I'm most comfortable with the tidyverse and base R and less comfortable with data.table.

Below is a snippet of data for a reproducible example:

my_data <- data.frame("student_id" = 1:6, 
                      "entry_date_1" = as.Date(c("2017-11-09","2017-08-14","2017-08-14","2018-01-24","2017-10-04","2017-08-14")), 
                      "withdrawal_date_1" = as.Date(c("2018-05-24","2017-12-15","2018-06-01","2018-02-25","2017-11-11","2018-05-24")),
                      "entry_date_2" = as.Date(c(NA,"2017-12-16","2018-01-16","2018-04-03","2017-12-12",NA)), 
                      "withdrawal_date_2" = as.Date(c(NA,"2018-05-24","2018-03-20","2018-05-24","2018-05-24",NA)))

Thanks in advance for any help!

You can use int_overlaps() in lubridate .

library(dplyr)
library(lubridate)

my_data %>%
  mutate(overlap = int_overlaps(interval(entry_date_1, withdrawal_date_1),
                                interval(entry_date_2, withdrawal_date_2)))

#   student_id entry_date_1 withdrawal_date_1 entry_date_2 withdrawal_date_2 overlap
# 1          1   2017-11-09        2018-05-24         <NA>              <NA>      NA
# 2          2   2017-08-14        2017-12-15   2017-12-16        2018-05-24   FALSE
# 3          3   2017-08-14        2018-06-01   2018-01-16        2018-03-20    TRUE
# 4          4   2018-01-24        2018-02-25   2018-04-03        2018-05-24   FALSE
# 5          5   2017-10-04        2017-11-11   2017-12-12        2018-05-24   FALSE
# 6          6   2017-08-14        2018-05-24         <NA>              <NA>      NA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM