简体   繁体   English

最大覆盖脱节间隔

[英]Max coverage disjoint intervals

Assume you have k<=10^5 intervals [a_i, b_i] \\in [1,10^18] (some of them may overlap), and you need to choose a set of intervals mutually disjoint such that their union is maximal. 假设您在[1,10 ^ 18]中有k <= 10 ^ 5个间隔[a_i,b_i] \\(其中一些可能重叠),并且您需要选择一组相互不相交的间隔,以使它们的并集最大。 Not maximum number of disjoint intervals, but the union must cover the most. 断开间隔的最大数量不是最大,但联合必须覆盖最多。

Can't try all possible subsets 2^k infeasible. 无法尝试所有可能的子集2 ^ k不可行。 Greedy approaches ordering by a_i ( interval covering algorithm) and ordering by b_i ( maximum number of disjoint intervals algorithm ) didn't work Can't figure out if there is a dynamic program solution. 贪婪方法无法通过a_i(区间覆盖算法)进行排序,而通过b_i(不交集区间最大算法)进行排序无法解决是否存在动态程序解决方案。 Given the size of the input, I think the solution should be O(k log k) or O(k) 给定输入的大小,我认为解决方案应该是O(k log k)或O(k)

Examples 1. [1,4], [3,5], [5,9], [7, 18] Sol [3,5]u[7,18] 示例1. [1,4],[3,5],[5,9],[7,18] Sol [3,5] u [7,18]

  1. [1,2], [2,6], [3,4], [5,7] Sol [1,2]u[3,4]u[5,7] [1,2],[2,6],[3,4],[5,7]溶胶[1,2] u [3,4] u [5,7]

  2. [2,30], [25,39], [30,40] Sol [2,30] [2,30],[25,39],[30,40]溶胶[2,30]

The problem can be solved in O(k log(k)) . 这个问题可以在O(k log(k))

First sort the intervals by their upper bounds (the b_i s). 首先,按区间的上限( b_i s)对区间进行排序。 Let I(1), I(2), ..., I(k) be the list of sorted intervals. I(1), I(2), ..., I(k)为排序间隔的列表。 That is, 那是,

b_1 <= b_2 <= ... <= b_k

Denote by w(i) the length of interval I(i) . w(i)表示间隔I(i)的长度。 That is, 那是,

w(i) = b_i - a_i

Denote by f(i) the total length of the optimal solution among those whose last interval is I(i) . f(i)表示最后间隔为I(i)的最优解的总长度。 That is, the solution corresponding to f(i) is a set which: 也就是说,对应于f(i)是一个集合,其中:

  1. contains the interval I(i) 包含间隔I(i)
  2. doesn't contain any interval whose upper bound is above b_i 不包含上限大于b_i任何间隔
  3. has the maximum cover among the sets of (non-overlapping) intervals satisfying 1+2 在满足(1 + 2)个(非重叠)区间的集合中具有最大覆盖率

Now we are going to compute f(1), f(2), ..., f(k) and return the maximum value of them all. 现在我们将计算f(1), f(2), ..., f(k)并返回它们全部的最大值。 Clearly, the optimal solution corresponds to one of the f(i) s and therefore the maximal f(i) is the optimal solution. 显然,最优解对应于f(i) ,因此最大f(i)是最优解。

To compute each f(i) we use dynamic programming. 为了计算每个f(i)我们使用动态编程。 We do this by relying on the following recurrence relation: 我们通过依赖以下递归关系来做到这一点:

f(i) = w(i) + max{f(j) | b_j < a_i}

I'll demonstrate the computation with your first input example: 我将用您的第一个输入示例来演示计算:

I(1)=[1, 4],  w(1)=3
I(2)=[3, 5],  w(2)=2
I(3)=[5, 9],  w(3)=4
I(4)=[7, 18], w(4)=11

We compute f(i) for i=1, 2, 3, 4 : 我们为i=1, 2, 3, 4计算f(i)

f(1) = w(1) + max{None} = 3 
    f(1) intervals: {I(1)}

f(2) = w(2) + max{None} = 2 
    f(2) intervals: {I(2)}

f(3) = w(3) + max{f(1)} = 4 + 1 = 5 
    f(3) intervals = {I(1), I(3)}

f(4) = w(4) + max{f(1), f(2)} = 11 + f(1) = 11 + 3 = 14 
    f(4) intervals = {I(1), I(4)}

The maximum f(i) is f(4) which corresponds to the set of intervals {I(1), I(4)} , the optimal solution. 最大f(i)f(4) ,它对应于间隔集{I(1), I(4)} (最优解)。

There seems to be a O(k * log(k)) solution. 似乎有一个O(k * log(k))解决方案。 It can be achieved with segment tree data structure. 这可以通过段树数据结构来实现。

We may at first populate some endPos array of segment endings, sort it. 我们可能首先填充一些段结尾的endPos数组,然后对其进行排序。 Memorise for each of the segments corresponding endPos index. 为每个段存储相应的endPos索引。 For this let endPosIdx be such array that endPosIdx j will store an index in endPos where the j -th segment ends. 为此,令endPosIdx为这样的数组,使endPosIdx j将在第j个段结束的endPos中存储一个索引。

Next we will introduce a segment tree. 接下来,我们将介绍一个段树。 It will process the following requests: 它将处理以下请求:
1. getMax(i) - get maximum value on the range [0, i] . 1. getMax(i) -在[0, i]范围内获取最大值。
2. update(i, value) - update maximum at i -th position with value . 2. update(i, value) -用value更新第i个位置的value
i is and index in endPos array. 是和endPos数组中的索引。 Calling getMax(i) we ask for what maximum cover can we achieve if non of the segments ends after endPos i . 调用getMax(i)询问如果没有段在endPos i之后结束,我们可以达到什么最大覆盖率。 Calling update(i, value) we say that now there exists a cover with length value ending at endPos i . 调用update(i,value)时,我们说现在存在一个以endPos i结尾的长度的封面。

Sort all segments in increasing order by their starting position a j . 按照所有片段的起始位置a j对其进行排序。 Process them in that order. 按此顺序处理它们。 The gist is to find the largest cover if we will certainly take current segment in resulting set. 如果我们一定会在结果集中采用当前细分,则要找到最大的覆盖范围。 Current cover will equal to the sum of the length of current segment and max cover of the segments ending before current. 当前覆盖范围将等于当前片段的长度与当前片段结束之前片段的最大覆盖片段之和。 Let j be the index of current segment (they are sorted by start pos). j为当前段的索引(按开始位置排序)。 Let i then be such max index that endPos i ≤ a j ( i may be found from j by binary search). 那么,让endPos ≤第j这样的最大索引(i可从j由二进制搜索找到)。 Then we can find 然后我们可以找到

cover j = length j + getMax(i) 封面j =长度j + getMax(i)

Next we should update segment tree calling update(endPosIdx j , cover j ) and proceed to the next segment. 接下来,我们应该更新调用update(endPosIdx j ,cover j )的段树然后继续进行下一个段。

After processing of all the segments the solution can be found by calling getMax(size(endPos)) . 处理完所有段之后,可以通过调用getMax(size(endPos))找到解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM