简体   繁体   English

查找组中个人的开始时间和结束时间之间的重叠时间

[英]finding overlapping time between start time and end time of individuals in a group

I have我有

     household       person     start time   end time
          1           1          07:45:00    21:45:00
          1           2          09:45:00    17:45:00
          1           3          22:45:00    23:45:00
          1           4          08:45:00    01:45:00
          1           1          06:45:00    19:45:00
          2           1          07:45:00    21:45:00
          2           2          016:45:00   22:45:00

I want to find a column to find overlapping time between family members.我想找一个专栏来查找家庭成员之间的重叠时间。

I need that column to be index of a person or persons who has/have time intersection with another one.我需要该列作为一个或多个与另一个人有时间交集的人的索引。

In the above example first family, the time of first, second and forth persons have intersection.在上面的例子一族中,一、二、四人的时间有交集。

output:输出:

      household       person     start time   end time      overlap
          1           1          07:45:00    21:45:00           2,4
          1           2          09:45:00    17:45:00           1,4
          1           3          22:45:00    23:45:00            NA
          1           4          08:45:00    01:45:00           1,2
          1           1          18:45:00    19:45:00            NA     
          2           1          07:45:00    21:45:00            2
          2           2          016:45:00   22:45:00            1

NA means no intersection with other family member it can be 0 or whatever NA 表示与其他家庭成员没有交集,它可以是 0 或其他任何值

Left join the input DF to itself joining on other persons in the same household and on the overlap condition.在重叠条件下,将输入DF加入到自身加入同一家庭中的其他人。 Then group by row concatenating the matched persons into a comma separated string.然后按行将匹配的人连接成逗号分隔的字符串。

In the absence of an explanation of what constitutes overlap we try three different definitions of overlap.在没有解释什么是重叠的情况下,我们尝试了三种不同的重叠定义。 The third is the closest to the output shown in the question.第三个最接近问题中显示的输出。

  1. if end_time < start_time then everything before end_time and after start_time are in the interval to be checked for overlap.如果end_time < start_time然后前一切end_time和后start_time是在间隔要检查重叠。 The overlap condition then decomposes into 4 cases according to whether the left and right hand sides of the join satisfy this or not.然后根据连接的左侧和右侧是否满足此条件,将重叠条件分解为 4 种情况。

  2. if start_time > end_time on either the left or right hand side then we regard the two as not overlapping如果左侧或右侧的start_time > end_time则我们认为两者不重叠

  3. If end_time > start_time then reverse them and perform overlap as before.如果 end_time > start_time 然后反转它们并像以前一样执行重叠。

First overlap definition of overlap重叠的第一个重叠定义

library(sqldf)

sqldf("select a.*, group_concat(distinct b.person) as overlap
  from DF a
  left join DF b 
    on a.household = b.household and 
       a.person != b.person and
       (case 
          when a.start_time <= a.end_time and b.start_time <= b.end_time then 
               (a.start_time between b.start_time and b.end_time or
               b.start_time between a.start_time and a.end_time)
          when a.start_time <= a.end_time and b.start_time > b.end_time then
               not (a.start_time between b.end_time and b.start_time and
               a.end_time between b.end_time and b.start_time)
          when a.start_time > a.end_time and b.start_time <= b.end_time then
               not (b.start_time between a.end_time and a.start_time and
               b.end_time between a.end_time and a.start_time)
          else 1 end)
  group by a.rowid")

giving:给予:

  household person start_time end_time overlap
1         1      1   07:45:00 21:45:00       2
2         1      2   09:45:00 17:45:00     1,4
3         1      3   22:45:00 23:45:00       4
4         1      4   08:45:00 01:45:00     2,3
5         1      1   06:45:00 19:45:00       2
6         2      1   07:45:00 21:45:00       2
7         2      2  016:45:00 22:45:00       1

Second overlap definition of overlap重叠的二次重叠定义

library(sqldf)

sqldf("select a.*, group_concat(distinct b.person) as overlap
  from DF a
  left join DF b 
    on a.household = b.household and 
       a.person != b.person and              
       (case
          when a.start_time <= a.end_time and b.start_time <= b.end_time then
               (a.start_time between b.start_time and b.end_time or
               b.start_time between a.start_time and a.end_time)
          else 0 end)
  group by a.rowid")

giving:给予:

  household person start_time end_time overlap
1         1      1   07:45:00 21:45:00       2
2         1      2   09:45:00 17:45:00       1
3         1      3   22:45:00 23:45:00    <NA>
4         1      4   08:45:00 01:45:00    <NA>
5         1      1   06:45:00 19:45:00       2
6         2      1   07:45:00 21:45:00       2
7         2      2  016:45:00 22:45:00       1

Third definition of overlap重叠的第三个定义

sqldf("with DF2(rowid, household, person, start_time, end_time, st, en) as (
  select rowid, *, 
    min(start_time, end_time) as st,
    max(start_time, end_time) as en
  from DF)

  select a.household, a.person, a.start_time, a.end_time, 
      group_concat(distinct b.person) as overlap
    from DF2 a
    left join DF2 b 
      on a.household = b.household and 
         a.person != b.person and                  
         (a.st between b.st and b.en or
          b.st between a.st and a.en)
    group by a.rowid")

giving:给予:

  household person start_time end_time overlap
1         1      1   07:45:00 21:45:00     2,4
2         1      2   09:45:00 17:45:00       1
3         1      3   22:45:00 23:45:00    <NA>
4         1      4   08:45:00 01:45:00       1
5         1      1   06:45:00 19:45:00     2,4
6         2      1   07:45:00 21:45:00       2
7         2      2   16:45:00 22:45:00       1

Note笔记

We assume that the input DF in reproducible form is:我们假设可重现形式的输入DF是:

DF <- structure(list(household = c(1L, 1L, 1L, 1L, 1L, 2L, 2L), person = c(1L, 
2L, 3L, 4L, 1L, 1L, 2L), start_time = c("07:45:00", "09:45:00", 
"22:45:00", "08:45:00", "06:45:00", "07:45:00", "16:45:00"), 
    end_time = c("21:45:00", "17:45:00", "23:45:00", "01:45:00", 
    "19:45:00", "21:45:00", "22:45:00")), class = "data.frame", row.names = c(NA, 
-7L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM