简体   繁体   English

如何将一组重叠范围划分为非重叠范围?

[英]How to divide a set of overlapping ranges into non-overlapping ranges?

Let's say you have a set of ranges:假设您有一组范围:

  • 0 - 100: 'a' 0 - 100:“一”
  • 0 - 75: 'b' 0 - 75:'b'
  • 95 - 150: 'c' 95 - 150:'c'
  • 120 - 130: 'd' 120 - 130:'d'

Obviously, these ranges overlap at certain points.显然,这些范围在某些点重叠。 How would you dissect these ranges to produce a list of non-overlapping ranges, while retaining information associated with their original range (in this case, the letter after the range)?您将如何剖析这些范围以生成不重叠范围的列表,同时保留与其原始范围相关的信息(在这种情况下,范围后面的字母)?

For example, the results of the above after running the algorithm would be:例如,上面运行算法后的结果将是:

  • 0 - 75: 'a', 'b' 0 - 75:'a','b'
  • 76 - 94: 'a' 76 - 94:'一个'
  • 95 - 100: 'a', 'c' 95 - 100:'a','c'
  • 101 - 119: 'c' 101 - 119:'c'
  • 120 - 130: 'c', 'd' 120 - 130:'c','d'
  • 131 - 150: 'c' 131 - 150:'c'

I had the same question when writing a program to mix (partly overlapping) audio samples.在编写混合(部分重叠)音频样本的程序时,我遇到了同样的问题。

What I did was add an "start event" and "stop event" (for each item) to a list, sort the list by time point, and then process it in order.我所做的是将“开始事件”和“停止事件”(针对每个项目)添加到列表中,按时间点对列表进行排序,然后按顺序处理。 You could do the same, except using an integer point instead of a time, and instead of mixing sounds you'd be adding symbols to the set corresponding to a range.您可以这样做,除了使用 integer 点而不是时间,而不是混合声音,您将向与范围相对应的集合中添加符号。 Whether you'd generate empty ranges or just omit them would be optional.无论您是生成空范围还是忽略它们都是可选的。

Edit Perhaps some code... Edit也许一些代码......

# input = list of (start, stop, symbol) tuples
points = [] # list of (offset, plus/minus, symbol) tuples
for start,stop,symbol in input:
    points.append((start,'+',symbol))
    points.append((stop,'-',symbol))
points.sort()

ranges = [] # output list of (start, stop, symbol_set) tuples
current_set = set()
last_start = None
for offset,pm,symbol in points:
    if pm == '+':
         if last_start is not None:
             #TODO avoid outputting empty or trivial ranges
             ranges.append((last_start,offset-1,current_set))
         current_set.add(symbol)
         last_start = offset
    elif pm == '-':
         # Getting a minus without a last_start is unpossible here, so not handled
         ranges.append((last_start,offset-1,current_set))
         current_set.remove(symbol)
         last_start = offset

# Finish off
if last_start is not None:
    ranges.append((last_start,offset-1,current_set))

Totally untested, obviously.显然,完全未经测试。

I'd say create a list of the endpoints and sort it, also index the list of ranges by starting and ending points.我会说创建一个端点列表并对其进行排序,还可以通过起点和终点索引范围列表。 Then iterate through the list of sorted endpoints, and for each one, check the ranges to see which ones are starting/stopping at that point.然后遍历排序的端点列表,并为每个端点检查范围以查看哪些端点在该点开始/停止。

This is probably better represented in code... if your ranges are represented by tuples:这可能在代码中更好地表示......如果您的范围由元组表示:

ranges = [(0,100,'a'),(0,75,'b'),(95,150,'c'),(120,130,'d')]
endpoints = sorted(list(set([r[0] for r in ranges] + [r[1] for r in ranges])))
start = {}
end = {}
for e in endpoints:
    start[e] = set()
    end[e] = set()
for r in ranges:
    start[r[0]].add(r[2])
    end[r[1]].add(r[2])
current_ranges = set()
for e1, e2 in zip(endpoints[:-1], endpoints[1:]):
    current_ranges.difference_update(end[e1])
    current_ranges.update(start[e1])
    print '%d - %d: %s' % (e1, e2, ','.join(current_ranges))

Although looking at this in retrospect, I'd be surprised if there wasn't a more efficient (or at least cleaner-looking) way to do it.虽然回想起来,如果没有更有效(或至少看起来更干净)的方法来做到这一点,我会感到惊讶。

What you describe is an example of set theory.你描述的是集合论的一个例子。 For a general algorithm for computing unions, intersections, and differences of sets see:有关计算集合的并集、交集和差集的通用算法,请参见:

www.gvu.gatech.edu/~jarek/graphics/papers/04PolygonBooleansMargalit.pdf www.gvu.gatech.edu/~jarek/graphics/papers/04PolygonBooleansMargalit.pdf

While the paper is targeted at graphics it is applicable to general set theory as well.虽然本文针对的是图形,但它也适用于一般集合论。 Not exactly light reading material.不完全是轻阅读材料。

A similar answer to Edmunds, tested, including support for intervals like (1,1):对 Edmunds 的类似回答,经过测试,包括对 (1,1) 等间隔的支持:

class MultiSet(object):
    def __init__(self, intervals):
        self.intervals = intervals
        self.events = None

    def split_ranges(self):
        self.events = []
        for start, stop, symbol in self.intervals:
            self.events.append((start, True, stop, symbol))
            self.events.append((stop, False, start, symbol))

        def event_key(event):
            key_endpoint, key_is_start, key_other, _ = event
            key_order = 0 if key_is_start else 1
            return key_endpoint, key_order, key_other

        self.events.sort(key=event_key)

        current_set = set()
        ranges = []
        current_start = -1

        for endpoint, is_start, other, symbol in self.events:
            if is_start:
                if current_start != -1 and endpoint != current_start and \
                       endpoint - 1 >= current_start and current_set:
                    ranges.append((current_start, endpoint - 1, current_set.copy()))
                current_start = endpoint
                current_set.add(symbol)
            else:
                if current_start != -1 and endpoint >= current_start and current_set:
                    ranges.append((current_start, endpoint, current_set.copy()))
                current_set.remove(symbol)
                current_start = endpoint + 1

        return ranges


if __name__ == '__main__':
    intervals = [
        (0, 100, 'a'), (0, 75, 'b'), (75, 80, 'd'), (95, 150, 'c'), 
        (120, 130, 'd'), (160, 175, 'e'), (165, 180, 'a')
    ]
    multiset = MultiSet(intervals)
    pprint.pprint(multiset.split_ranges())


[(0, 74, {'b', 'a'}),
 (75, 75, {'d', 'b', 'a'}),
 (76, 80, {'d', 'a'}),
 (81, 94, {'a'}),
 (95, 100, {'c', 'a'}),
 (101, 119, {'c'}),
 (120, 130, {'d', 'c'}),
 (131, 150, {'c'}),
 (160, 164, {'e'}),
 (165, 175, {'e', 'a'}),
 (176, 180, {'a'})]

Pseudocode:伪代码:

unusedRanges = [ (each of your ranges) ]
rangesInUse = []
usedRanges = []
beginningBoundary = nil

boundaries = [ list of all your ranges' start and end values, sorted ]
resultRanges = []

for (boundary in boundaries) {
    rangesStarting = []
    rangesEnding = []

    // determine which ranges begin at this boundary
    for (range in unusedRanges) {
        if (range.begin == boundary) {
            rangesStarting.add(range)
        }
    }

    // if there are any new ones, start a new range
    if (rangesStarting isn't empty) {
        if (beginningBoundary isn't nil) {
            // add the range we just passed
            resultRanges.add(beginningBoundary, boundary - 1, [collected values from rangesInUse])
        }

        // note that we are starting a new range
        beginningBoundary = boundary

        for (range in rangesStarting) {
            rangesInUse.add(range)
            unusedRanges.remove(range)
        }
    }

    // determine which ranges end at this boundary
    for (range in rangesInUse) {
        if (range.end == boundary) {
            rangesEnding.add(range)
        }
    }

    // if any boundaries are ending, stop the range
    if (rangesEnding isn't empty) {
        // add the range up to this boundary
        resultRanges.add(beginningBoundary, boundary, [collected values from rangesInUse]

        for (range in rangesEnding) {
            usedRanges.add(range)
            rangesInUse.remove(range)
        }

        if (rangesInUse isn't empty) {
            // some ranges didn't end; note that we are starting a new range
            beginningBoundary = boundary + 1
        }
        else {
            beginningBoundary = nil
        }
    }
}

Unit test:单元测试:

At the end, resultRanges should have the results you're looking for, unusedRanges and rangesInUse should be empty, beginningBoundary should be nil, and usedRanges should contain what unusedRanges used to contain (but sorted by range.end).最后,resultRanges 应该有你正在寻找的结果,unusedRanges 和 rangeInUse 应该是空的,beginningBoundary 应该是 nil,并且 usedRanges 应该包含未使用的Ranges 曾经包含的内容(但按 range.end 排序)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM