简体   繁体   中英

How to get all maximal non-overlapping sets of spans from a list of spans

I can't seem to find a way to write the algorithm in the title without needing to curate the results in some way.

To illustrate what I want:

all_spans = [(0, 5), (2, 7), (5, 8), (6, 10), (9, 10), (11, 15)]
possible_sets = [
    {(0, 5), (5, 8), (9, 10), (11, 15)},
    {(2, 7), (9, 10), (11, 15)},
    {(0, 5), (6, 10), (11, 15)}
]
not_possible = [
    {(0, 5), (5, 8), (6, 10), (11, 15)},  # has overlaps
    {(5, 8), (9, 10), (11, 15)}           # not maximal w.r.t possible_sets[0]
]

My current implementation is more or less this:

def has_overlap(a, b):
    return a[1] > b[0] and b[1] > a[0]

def combine(spans, current, idx=0):
    for i in range(idx, len(spans)):
        overlaps = {e for e in current if has_overlap(e, spans[i])}
        if overlaps:
            yield from combine(spans, current-overlaps, i)
        else:
            current.add(spans[i])
    yield current

But it produces non-maximal spans that I'd rather just not create in the first place.

>>> for s in combine(all_spans, set()):
...     print(sorted(s))
[(9, 10), (11, 15)]
[(6, 10), (11, 15)]
[(5, 8), (9, 10), (11, 15)]
[(9, 10), (11, 15)]
[(6, 10), (11, 15)]
[(2, 7), (9, 10), (11, 15)]
[(0, 5), (9, 10), (11, 15)]
[(0, 5), (6, 10), (11, 15)]
[(0, 5), (5, 8), (9, 10), (11, 15)]

Is there a different approach that avoids this behavior? I found similar problems under the keywords "interval overlaps" and "activity scheduling", but none of them seemed to refer to this particular problem.

It depends on what you mean by not wanting to curate the results.

You can filter out the non-maximal results after using your generator with:

all_results = [s for s in combine(all_spans, set())]

for first_result in list(all_results):
    for second_result in list(all_results):
        if first_result.issubset(second_result) and first_result != second_result:
            all_results.remove(first_result)
            break

To not produce them in the first place, you could do a check before yielding to see whether an answer is maximal. Something like:

def combine(spans, current, idx=0):
    for i in range(idx, len(spans)):
        overlaps = {e for e in current if has_overlap(e, spans[i])}
        if overlaps:
            yield from combine(spans, current-overlaps, i)
        else:
            current.add(spans[i])
    # Check whether the current set is maximal.
    possible_additions = set(spans)
    for item_to_consider in set(possible_additions):
        if any([has_overlap(item_in_current, item_to_consider) for item_in_current in current]):
            possible_additions.remove(item_to_consider)
    if len(possible_additions) == 0:
        yield current

This is a simple (?) graph problem. Make a directed graph where each span is a node. There is an edge AB (from node A to node B) iff A[1] <= B[0] -- in prose, if span B doesn't start until span A finishes. Your graph would look like

Node    =>  Successors
(0, 5)  =>  (5, 8), (6, 10), (9, 10), (11, 15)
(2, 7)  =>  (9, 10), (11, 15)
(5, 8)  =>  (9, 10), (11, 15)
(6, 10) =>  (11, 15)
(9, 10) =>  (11, 15)

Now, the problem reduces to simply finding the longest path through the graph, including ties.

Given the linearity of the problem, finding one maximal solution is easier: at each step, pick the successor node with the soonest ending time. In steps:

  1. To start, all nodes are available. The one with the soonest ending time is (0,5).
  2. The successor to (0,5) with the earliest end is (5, 8).
  3. The successor to (5,8) ... is (9, 10)
  4. ... and finally add (11, 15)

Note that this much doesn't require a graph; merely a structure you're willing to reference by either first or second sub-element.

The solution length is 4, as you already know.

Can you take it form here?

Assuming ranges are sorted by lower bound, we'd like to append the current range to the longest paths it can be appended to, or create a new path (append to an empty path). If it's called for, we could consider making the search for the longest prefixes more efficient. (The code below just updates that search in a slightly optimised linear method.)

(I'm not sure how to use the yield functionality, perhaps you could make this code more elegant.)

# Assumes spans are sorted by lower bound
# and each tuple is a valid range
def f(spans):
  # Append the current span to the longest
  # paths it can be appended to.
  paths = [[spans.pop(0)]]
  for l,r in spans:
    to_extend = []
    longest = 0
    print "\nCandidate: %s" % ((l,r),)
    for path in paths:
      lp, rp = path[-1]
      print "Testing on %s" % ((lp,rp),)
      if lp <= l < rp:
        prefix = path[:-1]
        if len(prefix) >= longest:
          to_extend.append(prefix + [(l,r)])
          longest = len(prefix)
      # Otherwise, it's after so append it.
      else:
        print "Appending to path: %s" % path
        path.append((l, r))
        longest = len(path)
    for path in to_extend:
      print "Candidate extensions: %s" % to_extend
      if len(path) == longest + 1:
        print "Adding to total paths: %s" % path
        paths.append(path)

  print "\nResult: %s" % paths
  return paths

all_spans = [(0, 5), (2, 7), (5, 8), (6, 10), (9, 10), (11, 15)]

f(all_spans)

Output:

"""
Candidate: (2, 7)
Testing on (0, 5)
Candidate extensions: [[(2, 7)]]
Adding to total paths: [(2, 7)]

Candidate: (5, 8)
Testing on (0, 5)
Appending to path: [(0, 5)]
Testing on (2, 7)

Candidate: (6, 10)
Testing on (5, 8)
Testing on (2, 7)
Candidate extensions: [[(0, 5), (6, 10)]]
Adding to total paths: [(0, 5), (6, 10)]

Candidate: (9, 10)
Testing on (5, 8)
Appending to path: [(0, 5), (5, 8)]
Testing on (2, 7)
Appending to path: [(2, 7)]
Testing on (6, 10)

Candidate: (11, 15)
Testing on (9, 10)
Appending to path: [(0, 5), (5, 8), (9, 10)]
Testing on (9, 10)
Appending to path: [(2, 7), (9, 10)]
Testing on (6, 10)
Appending to path: [(0, 5), (6, 10)]

Result: [[(0, 5), (5, 8), (9, 10), (11, 15)],
         [(2, 7), (9, 10), (11, 15)],
         [(0, 5), (6, 10), (11, 15)]]
"""

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM