简体   繁体   中英

From a List of Intervals, Finding all Sets of Intervals where EACH Interval in One Set Overlaps with All Intervals in that Set

Instead of querying a list of intervals with a start and end date to retrieve all intervals from the list that overlap with only the search start and end date, what is the best approach to:

From a list of date intervals, 
Find all unique sets of intervals
Where every interval in each set overlaps with each other interval in that set

Using an integer example, take the list of integer intervals [{1,3},{2,4},{4,5},{5,7},{6,8}] . From this list, the following are all the unique sets of intervals where every interval in each set overlaps with each other :

{ {1,3}, {2,4} }
{ {2,4}, {4,5} }
{ {4,5}, {5,7} }
{ {5,7}, {6,8} }

Here is the class for a DateInterval:

from datetime import datetime
class DateInterval(object):
    def __init__(self, start_time, end_time):
        self.start_time = datetime.strptime(start_time, '%Y-%m-%d %H:%M:%S')
        seld.end_time = datetime.strptime(end_time, '%Y-%m-%d %H:%M:%S')

    ''' eq, gt, hash methods removed for clarity '''

I'll receive a list of intervals sorted by start_time ascending like so:

intervals = [DateInterval(start_time='2015-01-01 08:00:00', end_time='2015-01-01 08:30:00'),
             DateInterval(start_time='2015-01-01 08:00:00', end_time='2015-01-01 10:00:00'),
             DateInterval(start_time='2015-01-01 09:00:00', end_time='2015-01-01 11:00:00'),
             DateInterval(start_time='2015-01-01 10:00:00', end_time='2015-01-01 12:00:00'),
             DateInterval(start_time='2015-01-01 13:00:00', end_time='2015-01-01 16:00:00'),
             DateInterval(start_time='2015-01-01 14:00:00', end_time='2015-01-01 17:00:00'),
             DateInterval(start_time='2015-01-01 15:00:00', end_time='2015-01-01 18:00:00'),
             DateInterval(start_time='2015-01-01 20:00:00', end_time='2015-01-01 22:00:00'),
             DateInterval(start_time='2015-01-01 20:00:00', end_time='2015-01-01 22:00:00')
             ]

(In this example list, the start and end dates always land evenly on an hour. However, they could land on any second instead (or maybe milliseconds)). After searching the exhaustive list of questions on stackoverflow regarding overlapping intervals, I found the Interval Tree to be unsuitable for Date Intervals ).

My lightly optimized brute force method consists of three tasks

  1. Identify all non-unique sets of intervals where at least one interval in each set overlaps with all the other intervals in that set
  2. Deduplicate the results of step 1 to find all unique sets of intervals where at least one interval in each set overlaps with all the other intervals in that set
  3. From the results of 1, find only those sets where each interval in one set overlaps with all other intervals in that set

1.

The following finds all non-unique sets where only one interval in each set overlaps with every other interval in that set, by naively comparing each interval in the interval list to all the other intervals. It assumes the list of intervals are sorted by date time ascending, which enables the break optimization

def search(intervals, start_date, end_date):
    results = []
    for interval in intervals:
        if end_date >= interval.start_time:
            if start_date <= interval.end_time:
                results.append(interval)
        else:
            break # This assumes intervals are sorted by date time ascending

search is used like so:

brute_overlaps = []
for interval in intervals:
    brute_overlaps.append(search(intervals, interval.start_time, interval.end_time))

2.

The following deduplicates the list of sets:

def uniq(l):
    last = object()
    for item in l:
        if item == last:
            continue
        yield item
        last = item

def sort_and_deduplicate(l):
    return list(uniq(sorted(l, reverse=True)))

3.

And the following finds all sets where each interval in each set that overlaps with all other intervals in that set, by naively comparing each interval in a set to every other interval in that set:

def all_overlap(overlaps):
    results = []
    for overlap in overlaps:
        is_overlap = True
        for interval in overlap:
            for other_interval in [o for o in overlap if o != interval]:
                if not (interval.end_time >= other_interval.start_time and interval.start_time <= other_interval.end_time):
                    is_overlap = False
                    break # If one interval fails
             else:        # break out of
                 continue # both inner for loops
             break        # and try next overlap

        if is_overlap: # all intervals in this overlap set overlap with each other
            results.append(overlap)
    return results

A set of intervals, where each interval must overlap with every other one in the set, will have a common point that they all overlap. Conversely, querying all the intervals at a point will give you a set of all-mutually-overlapping intervals.

With that in mind, your problem reduces to "What are the distinct subsets of intervals I can get, by change the point I'm querying at?". An easy way to get all of those distinct subsets is to find the locations of events where the overlapping intervals must change, and query at a point between each pair of events.

In the case of intervals, the events correspond to any interval starting or any interval ending. So, you just scan over the intervals starting and stopping, from left to right, while tracking the set of ones that have started but not ended. That gives you all the maximal mutually-overlapping subsets.

In pseudo-code...

maximalMutuallyOverlappingSubsets =
    intervals
    .flatMap(e => [(e.start, e, true),
                   (e.end, e, false)])
    .sortedBy(e => e[0]).
    .scan({}, (prevSet, (x, interval, add)) =>
        if add
        then prevSet + {interval}
        else prevSet - {interval})
    .distinct() - {{}}

Runs in O(n lg n) time, with sorting being the most expensive step.

If you're not familiar, flatMap projects each item of a list into a resulting collection and then concatenates all those resulting collections' items together. Scan starts with the given accumulator and keeps combining the next item into the accumulator with the given function while yielding the intermediate results.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM