简体繁体 English

范围内的Kth最小值

[英]Kth minimum in a Range

原文 2014-02-17 10:07:34 8 4 algorithm/ data-structures/ tree

Given an array of integers and some query operations. 给定一个整数数组和一些查询操作。
The query operations are of 2 types 查询操作有两种类型
1.Update the value of the ith index to x. 1.将第i个索引的值更新为x。
2.Given 2 integers find the kth minimum in that range.(Ex if the 2 integers are i and j ,we have to find out the kth minimum between i and j both inclusive). 2.给2个整数找到该范围内的第k个最小值。（例如，如果2个整数是i和j，我们必须找出i和j之间的第k个最小值）。
I can find the Range minimum query using segment tree but could no do so for the kth minimum. 我可以使用分段树找到范围最小查询但是对于第k个最小值不能这样做。 Can anyone help me? 谁能帮我？

4 个解决方案

Here is a O(polylog n) per query solution that does actually not assume a constant k , so the k can vary between queries. 这是一个O(polylog n)每个查询解决方案实际上不假设常数k ，因此k可以在查询之间变化。 The main idea is to use a segment tree, where every node represents an interval of array indices and contains a multiset (balanced binary search tree) of the values in the represened array segment. 主要思想是使用一个分段树，其中每个节点代表一个数组索引的间隔，并包含一个代表数组段中值的多集（平衡二叉搜索树）。 The update operation is pretty straightforward: 更新操作非常简单：

Walk up the segment tree from the leaf (the array index you're updating). 从叶子（您正在更新的数组索引）向上走分段树。 You will encounter all nodes that represent an interval of array indices that contain the updated index. 您将遇到表示包含更新索引的数组索引间隔的所有节点。 At every node, remove the old value from the multiset and insert the new value into the multiset. 在每个节点上，从多集中删除旧值并将新值插入到多集中。 Complexity: O(log^2 n) 复杂性： O(log^2 n)
Update the array itself. 更新阵列本身。

We notice that every array element will be in O(log n) multisets, so the total space usage is O(n log n) . 我们注意到每个数组元素都在O(log n)集合中，因此总空间使用量为O(n log n) 。 With linear-time merging of multisets we can build the initial segment tree in O(n log n) as well (there's O(n) work per level). 通过多个集合的线性时间合并，我们也可以在O(n log n)构建初始段树O(n log n)每个级别有O(n)工作）。

What about queries? 查询怎么样？ We are given a range [i, j] and a rank k and want to find the k-th smallest element in a[i..j] . 给出范围[i, j]和秩k并且想要找到a[i..j]中a[i..j]第k个最小元素。 How do we do that? 我们怎么做？

Find a disjoint coverage of the query range using the standard segment tree query procedure. 使用标准段树查询过程查找查询范围的不相交的覆盖范围。 We get O(log n) disjoint nodes, the union of whose multisets is exactly the multiset of values in the query range. 我们得到O(log n)不相交的节点，其多重集合的联合正是查询范围中多个值的集合。 Let's call those multisets s_1, ..., s_m (with m <= ceil(log_2 n) ). 让我们称那些多重集合s_1, ..., s_m （ m <= ceil(log_2 n) ）。 Finding the s_i takes O(log n) time. 查找s_i需要O(log n)时间。
Do a select(k) query on the union of s_1, ..., s_m . 对s_1, ..., s_m的并集执行select(k)查询。 See below. 见下文。

So how does the selection algorithm work? 那么选择算法如何工作呢？ There is one really simple algorithm to do this. 有一个非常简单的算法可以做到这一点。

We have s_1, ..., s_n and k given and want to find the smallest x in a , such that s_1.rank(x) + ... + s_m.rank(x) >= k - 1 , where rank returns the number of elements smaller than x in the respective BBST (this can be implemented in O(log n) if we store subtree sizes). 我们有s_1, ..., s_n和k给想要找最小的x中a ，这样s_1.rank(x) + ... + s_m.rank(x) >= k - 1 ，其中rank回报相应BBST中小于x的元素数量O(log n)如果我们存储子树大小，则可以在O(log n)实现）。 Let's just use binary search to find x ! 我们只是使用二进制搜索来找到x ！ We walk through the BBST of the root, do a couple of rank queries and check whether their sum is larger than or equal to k . 我们遍历根的BBST，做几个排名查询并检查它们的总和是否大于或等于k 。 It's a predicate monotone in x , so binary search works. 它是x的谓词单调，因此二进制搜索有效。 The answer is then the minimum of the successors of x in any of the s_i . 答案是任何s_i中x的后继者的最小值。

Complexity : O(n log n) preprocessing and O(log^3 n) per query. 复杂性 ：每次查询O(n log n)预处理和O(log^3 n) 。

So in total we get a runtime of O(n log n + q log^3 n) for q queries. 因此，对于q查询，我们总共得到O(n log n + q log^3 n)的运行时。 I'm sure we could get it down to O(q log^2 n) with a cleverer selection algorithm. 我确信我们可以通过更聪明的选择算法将其降低到O(q log^2 n) 。

UPDATE: If we are looking for an offline algorithm that can process all queries at once, we can get O((n + q) * log n * log (q + n)) using the following algorithm: 更新：如果我们正在寻找可以一次处理所有查询的离线算法，我们可以使用以下算法得到O((n + q) * log n * log (q + n)) ：

Preprocess all queries, create a set of all values that ever occured in the array. 预处理所有查询，创建一组在数组中发生的所有值。 The number of those will be at most q + n . 这些数量最多为q + n 。
Build a segment tree, but this time not on the array, but on the set of possible values. 构建一个分段树，但这次不在数组上，而是在可能的值集上。
Every node in the segment tree represents an interval of values and maintains a set of positions where these values occurs. 段树中的每个节点都表示值的间隔，并维护一组出现这些值的位置。
To answer a query, start at the root of the segment tree. 要回答查询，请从段树的根开始。 Check how many positions in the left child of the root lie in the query interval (we can do that by doing two searches in the BBST of positions). 检查根的左子节点中有多少位置位于查询间隔中（我们可以通过在位置的BBST中进行两次搜索来实现）。 Let that number be m . 让那个数字是m 。 If k <= m , recurse into the left child. 如果k <= m ，则递归到左边的孩子身上。 Otherwise recurse into the right child, with k decremented by m . 否则递归到正确的孩子， k减少m 。
For updates, remove the position from the O(log (q + n)) nodes that cover the old value and insert it into the nodes that cover the new value. 对于更新，从O(log (q + n))节点中移除覆盖旧值的位置，并将其插入到覆盖新值的节点中。

The advantage of this approach is that we don't need subtree sizes, so we can implement this with most standard library implementations of balanced binary search trees (eg set<int> in C++). 这种方法的优点是我们不需要子树大小，因此我们可以使用平衡二叉搜索树的大多数标准库实现（例如，在C ++中set<int> ）来实现这一点。

We can turn this into an online algorithm by changing the segment tree out for a weight-balanced tree such as a BB[α] tree . 我们可以通过将片段树更改为权重平衡树（例如BB [α]树）来将其转换为在线算法。 It has logarithmic operations like other balanced binary search trees, but allows us to rebuild an entire subtree from scratch when it becomes unbalanced by charging the rebuilding cost to the operations that must have caused the imbalance. 它具有与其他平衡二叉搜索树相同的对数运算，但允许我们通过将重建成本计入必然导致不平衡的操作来重建不平衡时从头开始重建整个子树。

If this is a programming contest problem, then you might be able to get away with the following O(n log(n) + qn^0.5 log(n)^1.5)-time algorithm. 如果这是编程竞赛问题，那么您可能能够使用以下O（n log（n）+ qn ^ 0.5 log（n）^ 1.5）-time算法。 It is set up to use the C++ STL well and has a much better big-O constant than Niklas's (previous?) answer on account of using much less space and indirection. 它被设置为使用C ++ STL并且具有比Niklas（之前的？）答案更好的大O常数，因为它使用了更少的空间和间接。

Divide the array into k chunks of length n/k. 将数组划分为长度为n / k的k个块。 Copy each chunk into the corresponding locations of a second array and sort it. 将每个块复制到第二个阵列的相应位置并对其进行排序。 To update: copy the chunk that changed into the second array and sort it again (time O((n/k) log(n/k)). To query: copy to a scratch array the at most 2 (n/k - 1) elements that belong to a chunk partially overlapping the query interval. Sort them. Use one of the answers to this question to select the element of the requested rank out of the union of the sorted scratch array and fully overlapping chunks, in time O(k log(n/k)^2). The optimum setting of k in theory is (n/log(n))^0.5. It's possible to shave another log(n)^0.5 using the complicated algorithm of Frederickson and Johnson. 要更新：将更改的块复制到第二个数组并再次排序（时间O（（n / k）log（n / k））。要查询：复制到临时数组最多2（n / k - 1）属于与查询间隔部分重叠的块的元素。对它们进行排序。使用此问题的答案之一从时间O中选择已排序的临时数组和完全重叠的块的并集中所请求的排名的元素（k log（n / k）^ 2）。理论上k的最优设置是（n / log（n））^ 0.5。使用Frederickson和Johnson的复杂算法可以削减另一个log（n）^ 0.5 。

执行存储桶排序的修改：创建一个包含所需范围内的数字的存储桶，然后仅对该存储桶进行排序并找到第k个最小值。

Damn, this solution can't update an element but at least finds that k-th element, here you'll get some ideas so you can think of some solution that provides update. 该死的，这个解决方案无法更新元素但至少找到了第k个元素，在这里你会得到一些想法，这样你就可以想到一些提供更新的解决方案。 Try pointer-based B-trees. 尝试基于指针的B树。

This is O(n log n) space and O(q log^2 n) time complexity. 这是O（n log n）空间和O（q log ^ 2 n）时间复杂度。 Later I explained the same with O(log n) per query. 后来我用每个查询的O（log n）解释了相同的内容。

So, you'll need to do the next: 所以，你需要做下一个：

1) Make a "segment tree" over given array. 1）在给定数组上创建“分段树”。

2) For every node, instead of storing one number, you would store a whole array. 2）对于每个节点，您将存储整个数组，而不是存储一个数字。 The size of that array has to be equal to the number of it's children. 该数组的大小必须等于它的子数。 That array (as you guessed) has to contain the values of the bottom nodes (children, or the numbers from that segment), but sorted. 该数组（如您所猜测的）必须包含底部节点（子节点或该节点中的数字）的值，但已排序。

3) To make such an array, you would merge two arrays from its two sons from segment tree. 3）要制作这样的数组，你可以从它的两个儿子合并来自分段树的两个数组。 But not only that, for every element from the array you have just made (by merging), you need to remember the position of the number before its insertion in merged array (basically, the array from which it comes, and position in it). 但不仅如此，对于你刚刚制作的数组中的每个元素（通过合并），你需要记住数字在合并数组中插入之前的位置（基本上，它来自的数组，并在其中定位）。 And a pointer to the first next element that is not inserted from the same array. 以及指向未从同一数组插入的第一个下一个元素的指针。

4) With this structure, you can check how many numbers there are that are lower than given value x, in some segment S. You find (with binary search) the first number in the array of the root node that is >= x. 4）使用此结构，您可以在某些段S中检查有多少数量低于给定值x的数字。您可以找到（使用二进制搜索）根节点数组中的第一个数字> = x。 And then, using the pointers you have made, you can find the results for the same question for two children arrays (arrays of nodes that are children to the previous node) in O(1). 然后，使用您所做的指针，您可以在O（1）中找到两个子数组（作为前一个节点的子节点的节点数组）的相同问题的结果。 You stop to operate this descending for each node that represents the segment that is whole either inside or outside of given segment S. The time complexity is O(log n): O(log n) to find the first element that is >=x, and O(log n) for all segments of decomposition of S. 您停止为每个节点操作此降序，该节点表示在给定段S内部或外部的整个段。时间复杂度为O（log n）：O（log n）以查找> = x的第一个元素，和O（log n）的所有S分解段。

5) Do a binary search over solution. 5）对解决方案进行二进制搜索。

This was solution with O(log^2 n) per query. 这是每个查询具有O（log ^ 2 n）的解决方案。 But you can reduce to O(log n) : 但是你可以减少到O（log n） ：

1) Before doing all I wrote above, you need to transform the problem. 1）在完成上面所写的所有操作之前，您需要转换问题。 You need to sort all numbers and remember the positions for each in original array. 您需要对所有数字进行排序并记住原始数组中每个数字的位置。 Now these positions are representing the array you are working on. 现在这些位置代表您正在处理的阵列。 Call that array P. 调用该数组P.

If bounds of the query segment are a and b. 如果查询段的边界是a和b。 You need to find the k-th element in P that is between a and b by value (not by index). 你需要找到P中的第k个元素，它是a和b之间的值（而不是索引）。 And that element represents the index of your result in original array. 该元素表示原始数组中结果的索引。

2) To find that k-th element, you would do some type of back-tracking with complexity of O(log n). 2）为了找到第k个元素，你会做一些复杂度为O（log n）的反向跟踪。 You will be asking the number of elements between index 0 and (some other index) that are between a and b by value. 您将询问索引0和（某些其他索引）之间的元素数量，这些元素在a和b之间除以值。

3) Suppose that you know the answer for such a question for some segment (0,h). 3）假设你知道某个段（0，h）的这个问题的答案。 Get answers on same type of questions for all segments in tree that begin on h, starting from the greatest one. 从最大的一个开始，为从h开始的树中的所有段获得相同类型问题的答案。 Keep getting those answers as long as the current answer (from segment (0,h)) plus the answer you got the last are greater than k. 只要当前答案（来自段（0，h））加上你最后得到的答案大于k，就继续得到那些答案。 Then update h. 然后更新h。 Keep updating h, until there is only one segment in tree that begins with h. 继续更新h，直到树中只有一个以h开头的段。 That h is the index of the number you are looking for in the problem you have stated. 那个h是你在所述问题中寻找的数字的索引。

To get the answer to such a question for some segment from tree you will spend exactly O(1) of time. 要从树中获得某个段的问题的答案，您将花费恰好O（1）的时间。 Because you already know the answer of it's parent's segment, and using the pointers I explained in the first algorithm you can get the answer for the current segment in O(1). 因为你已经知道它的父节点的答案，并且使用我在第一个算法中解释的指针，你可以得到O（1）中当前段的答案。