简体   繁体   English

在二维数组上查找第 K 个最小元素(或中值)的最快算法?

[英]Fastest algorithm for Kth smallest Element (or median) finding on 2 Dimensional Array?

I see a lot of SO topics on related topics but none of them provides the efficient way.我在相关主题上看到了很多 SO 主题,但没有一个提供有效的方法。

I want to find the k-th smallest element (or median) on 2D array [1..M][1..N] where each row is sorted in ascending order and all elements are distinct.我想在二维数组[1..M][1..N]上找到第k-th最小元素(或中值),其中每行按升序排序并且所有元素都是不同的。

I think there is O(M log MN) solution, but I have no idea about implementation.我认为有O(M log MN)解决方案,但我不知道实现。 (Median of Medians or Using Partition with Linear Complexity is some method but no idea any more...). (中位数的中位数或使用具有线性复杂性的分区是一些方法,但不再知道......)。

This is an old Google interview question and can be searched on Here .这是一个旧的谷歌面试问题,可以在这里搜索。

But now I want hint or describe the most efficient algorithm (the fastest one).但现在我想提示或描述最有效的算法最快的算法)。

Also I read a paper on here but I don't understand it.我也在这里读过一篇论文,但我不明白。

Update 1: one solution is found here but when dimension is odd.更新 1: 此处找到一种解决方案,但当维度为奇数时。

So to solve this problem, it helps to solve a slightly different one.所以要解决这个问题,它有助于解决一个稍微不同的问题。 We want to know the upper/lower bounds in each row for where the overall k'th cutoff is.我们想知道每行中第 k 个截止点所在位置的上限/下限。 Then we can go through, verify that the number of things at or below the lower bounds is < k, the number of things at or below the upper bounds is > k, and there is only one value between them.那么我们就可以通过,验证下界及以下的事物数<k,上界及以下的事物数>k,并且它们之间只有一个值。

I've come up with a strategy for doing a binary search in all rows simultaneously for those bounds.我已经提出了一种策略,可以在所有行中同时对这些边界进行二分搜索。 Being a binary search it "should" take O(log(n)) passes.作为二分搜索,它“应该”通过O(log(n))次。 Each pass involves O(m) work for a total of O(m log(n)) times.每次传递涉及O(m)工作,总共O(m log(n))次。 I put should in quotes because I don't have a proof that it actually takes O(log(n)) passes.我把应该放在引号中,因为我没有证据表明它实际上需要O(log(n))次传递。 In fact it is possible to be too aggressive in a row, discover from other rows that the pivot chosen was off, and then have to back off.事实上,有可能在一行中过于激进,从其他行中发现所选的枢轴已关闭,然后不得不后退。 But I believe that it does very little backing off and actually is O(m log(n)) .但我相信它几乎没有后退,实际上是O(m log(n))

The strategy is to keep track in each row of a lower bound, an upper bound, and a mid.策略是跟踪下限、上限和中间的每一行。 Each pass we make a weighted series of ranges to lower, lower to mid, mid to upper, and upper to the end with the weight being the number of things in it and the value being the last in the series.每次通过,我们都会对范围进行一系列加权,以降低、降低到中、从中到上、从上到结尾,权重是其中的事物数量,值是系列中的最后一个。 We then find the k'th value (by weight) in that data structure, and use that as a pivot for our binary search in each dimension.然后我们在该数据结构中找到第 k 个值(按权重),并将其用作我们在每个维度中进行二分搜索的主元。

If a pivot winds up out of the range from lower to upper, we correct by widening the interval in the direction that corrects the error.如果枢轴超出从下到上的范围,我们通过在纠正错误的方向上加宽间隔来进行纠正。

When we have the correct sequence, we've got an answer.当我们有正确的序列时,我们就有了答案。

There are a lot of edge cases, so staring at full code may help.有很多边缘情况,所以盯着完整的代码可能会有所帮助。

I also assume that all elements of each row are distinct.我还假设每一行的所有元素都是不同的。 If they are not, you can get into endless loops.如果不是,您可能会陷入无限循环。 (Solving that means even more edge cases...) (解决这意味着更多的边缘情况......)

import random

# This takes (k, [(value1, weight1), (value2, weight2), ...])
def weighted_kth (k, pairs):
    # This does quickselect for average O(len(pairs)).
    # Median of medians is deterministically the same, but a bit slower
    pivot = pairs[int(random.random() * len(pairs))][0]

    # Which side of our answer is the pivot on?
    weight_under_pivot = 0
    pivot_weight = 0
    for value, weight in pairs:
        if value < pivot:
            weight_under_pivot += weight
        elif value == pivot:
            pivot_weight += weight

    if weight_under_pivot + pivot_weight < k:
        filtered_pairs = []
        for pair in pairs:
            if pivot < pair[0]:
                filtered_pairs.append(pair)
        return weighted_kth (k - weight_under_pivot - pivot_weight, filtered_pairs)
    elif k <= weight_under_pivot:
        filtered_pairs = []
        for pair in pairs:
            if pair[0] < pivot:
                filtered_pairs.append(pair)
        return weighted_kth (k, filtered_pairs)
    else:
        return pivot

# This takes (k, [[...], [...], ...])
def kth_in_row_sorted_matrix (k, matrix):
    # The strategy is to discover the k'th value, and also discover where
    # that would be in each row.
    #
    # For each row we will track what we think the lower and upper bounds
    # are on where it is.  Those bounds start as the start and end and
    # will do a binary search.
    #
    # In each pass we will break each row into ranges from start to lower,
    # lower to mid, mid to upper, and upper to end.  Some ranges may be
    # empty.  We will then create a weighted list of ranges with the weight
    # being the length, and the value being the end of the list.  We find
    # where the k'th spot is in that list, and use that approximate value
    # to refine each range.  (There is a chance that a range is wrong, and
    # we will have to deal with that.)
    #
    # We finish when all of the uppers are above our k, all the lowers
    # one are below, and the upper/lower gap is more than 1 only when our
    # k'th element is in the middle.

    # Our data structure is simply [row, lower, upper, bound] for each row.
    data = [[row, 0, min(k, len(row)-1), min(k, len(row)-1)] for row in matrix]
    is_search = True
    while is_search:
        pairs = []
        for row, lower, upper, bound in data:
            # Literal edge cases
            if 0 == upper:
                pairs.append((row[upper], 1))
                if upper < bound:
                    pairs.append((row[bound], bound - upper))
            elif lower == bound:
                pairs.append((row[lower], lower + 1))
            elif lower + 1 == upper: # No mid.
                pairs.append((row[lower], lower + 1))
                pairs.append((row[upper], 1))
                if upper < bound:
                    pairs.append((row[bound], bound - upper))
            else:
                mid = (upper + lower) // 2
                pairs.append((row[lower], lower + 1))
                pairs.append((row[mid], mid - lower))
                pairs.append((row[upper], upper - mid))
                if upper < bound:
                    pairs.append((row[bound], bound - upper))

        pivot = weighted_kth(k, pairs)

        # Now that we have our pivot, we try to adjust our parameters.
        # If any adjusts we continue our search.
        is_search = False
        new_data = []
        for row, lower, upper, bound in data:
            # First cases where our bounds weren't bounds for our pivot.
            # We rebase the interval and either double the range.
            # - double the size of the range
            # - go halfway to the edge
            if 0 < lower and pivot <= row[lower]:
                is_search = True
                if pivot == row[lower]:
                    new_data.append((row, lower-1, min(lower+1, bound), bound))
                elif upper <= lower:
                    new_data.append((row, lower-1, lower, bound))
                else:
                    new_data.append((row, max(lower // 2, lower - 2*(upper - lower)), lower, bound))
            elif upper < bound and row[upper] <= pivot:
                is_search = True
                if pivot == row[upper]:
                    new_data.append((row, upper-1, upper+1, bound))
                elif lower < upper:
                    new_data.append((row, upper, min((upper+bound+1)//2, upper + 2*(upper - lower)), bound))
                else:
                    new_data.append((row, upper, upper+1, bound))
            elif lower + 1 < upper:
                if upper == lower+2 and pivot == row[lower+1]:
                    new_data.append((row, lower, upper, bound)) # Looks like we found the pivot.
                else:
                    # We will split this interval.
                    is_search = True
                    mid = (upper + lower) // 2
                    if row[mid] < pivot:
                        new_data.append((row, mid, upper, bound))
                    elif pivot < row[mid] pivot:
                        new_data.append((row, lower, mid, bound))
                    else:
                        # We center our interval on the pivot
                        new_data.append((row, (lower+mid)//2, (mid+upper+1)//2, bound))
            else:
                # We look like we found where the pivot would be in this row.
                new_data.append((row, lower, upper, bound))
        data = new_data # And set up the next search
    return pivot

Another answer has been added to provide an actual solution.已添加另一个答案以提供实际解决方案。 This one has been left as it was due to quite the rabbit hole in the comments.由于评论中有相当多的兔子洞,因此保留了这个。


I believe the fastest solution for this is the k-way merge algorithm.我相信最快的解决方案是 k-way 合并算法。 It is a O(N log K) algorithm to merge K sorted lists with a total of N items into a single sorted list of size N .它是一个O(N log K)算法,将K排序列表与总共N项目合并为一个大小为N排序列表。

https://en.wikipedia.org/wiki/K-way_merge_algorithm#k-way_merge https://en.wikipedia.org/wiki/K-way_merge_algorithm#k-way_merge

Given a MxN list.给定一个MxN列表。 This ends up being O(MNlog(M)) .这最终是O(MNlog(M)) However, that is for sorting the entire list.但是,这是为了对整个列表进行排序。 Since you only need the first K smallest items instead of all N*M , the performance is O(Klog(M)) .由于您只需要前K最小的项目而不是所有N*M ,因此性能为O(Klog(M)) This is quite a bit better than what you are looking for, assuming O(K) <= O(M) .假设O(K) <= O(M) ,这比您正在寻找的要好得多。

Though this assumes you have N sorted lists of size M .尽管这假设您有N大小为M排序列表。 If you actually have M sorted lists of size N , this can be easily handled though just by changing how you loop over the data (see the pseudocode below), though it does mean the performance is O(K log(N)) instead.如果您实际上有M大小为N排序列表,则可以通过更改循环数据的方式轻松处理(请参阅下面的伪代码),尽管这确实意味着性能是O(K log(N))

A k-way merge just adds the first item of each list to a heap or other data structure with a O(log N) insert and O(log N) find-mind. k-way 合并只是将每个列表的第一项添加到堆或其他具有O(log N)插入和O(log N) find-mind 的数据结构中。

Pseudocode for k-way merge looks a bit like this: k-way合并的伪代码看起来有点像这样:

  1. For each sorted list, insert the first value into the data structure with some means of determining which list the value came from.对于每个排序列表,将第一个值插入到数据结构中,并通过某种方式确定该值来自哪个列表。 IE: You might insert [value, row_index, col_index] into the data structure instead of just value . IE:您可以将[value, row_index, col_index]插入到数据结构中,而不仅仅是value This also lets you easily handle looping over either columns or rows.这还可以让您轻松处理列或行的循环。
  2. Remove the lowest value from the data structure and append to the sorted list.从数据结构中删除最低值并附加到排序列表。
  3. Given that the item in step #2 came from list I add the next lowest value from list I to the data structure.鉴于在第2步中的项目来自列表I从列表中添加下一个最低值, I的数据结构。 IE: if value was row 5 col 4 (data[5][4]) . IE:如果值为row 5 col 4 (data[5][4]) Then if you are using rows as lists, then the next value would be row 5 col 5 (data[5][5]) .然后,如果您将行用作列表,则下一个值将是row 5 col 5 (data[5][5]) If you are using columns then the next value is row 6 col 4 (data[6][4]) .如果您使用的是列,则下一个值为row 6 col 4 (data[6][4]) Insert this next value into the data structure like you did #1 (ie: [value, row_index, col_index] )将下一个值插入到数据结构中,就像您在 #1 中所做的一样(即: [value, row_index, col_index]
  4. Go back to step 2 as needed.根据需要返回第 2 步。

For your needs, do steps 2-4 K times.根据您的需要,执行 2-4 K次步骤。

Seems like the best way to go is a k-way merge in increasingly larger sized blocks.似乎最好的方法是在越来越大的块中进行 k-way 合并。 A k-way merge seeks to build a sorted list, but we don't need it sorted and we don't need to consider each element. k-way 合并试图建立一个排序列表,但我们不需要它排序,我们不需要考虑每个元素。 Instead we'll create a semi-sorted intervals.相反,我们将创建一个半排序的区间。 The intervals will be sorted, but only on the highest value.间隔将被排序,但仅按最高值排序。

https://en.wikipedia.org/wiki/K-way_merge_algorithm#k-way_merge https://en.wikipedia.org/wiki/K-way_merge_algorithm#k-way_merge

We use the same approach as a k-way merge, but with a twist.我们使用与 k-way 合并相同的方法,但有所不同。 Basically it aims to indirectly build a semi-sorted sublist.基本上它旨在间接构建一个半排序的子列表。 For example instead of finding [1,2,3,4,5,6,7,8,10] to determine the K=10, it will instead find something like [(1,3),(4,6),(7,15)].例如,不是找到 [1,2,3,4,5,6,7,8,10] 来确定 K=10,而是会找到类似 [(1,3),(4,6), (7,15)]。 With K-way merge we consider 1 item at a time from each list.通过 K-way 合并,我们一次从每个列表中考虑 1 个项目。 In this approach hover, when pulling from a given list, we want to first consider Z items, then 2 * Z items, then 2 * 2 * Z items, so 2^i * Z items for the i-th time.在这种悬停方法中,当从给定列表中提取时,我们要首先考虑 Z 个项目,然后是 2 * Z 个项目,然后是 2 * 2 * Z 个项目,因此第 i 次是 2^i * Z 个项目。 Given an MxN matrix that means it will require we pull up to O(log(N)) items from the list M times.给定一个 MxN 矩阵,这意味着它需要我们从列表中提取O(log(N))项目M次。

  1. For each sorted list, insert the first K sublists into the data structure with some means of determining which list the value came from.对于每个已排序的列表,将前K个子列表插入到数据结构中,并使用某种方法来确定值来自哪个列表。 We want the data structure to use the highest value in the sublist we insert into it.我们希望数据结构使用我们插入的子列表中的最高值。 In this case we would want something like [max_value of sublist, row index, start_index, end_index].在这种情况下,我们需要类似于 [max_value of sublist, row index, start_index, end_index]。 O(m)
  2. Remove the lowest value (this is now a list of values) from the data structure and append to the sorted list.从数据结构中删除最低值(现在是值列表)并附加到排序列表。 O(log (m))
  3. Given that the item in step #2 came from list I add the next 2^i * Z values from list I to the data structure upon the i-th time pulling from that specific list (basically just double the number that was present in the sublist just removed from the data structure).鉴于在第2步中的项目来自列表I补充下2^i * Z值从列表I要在第i个时间从该特定列表拉(基本上只是增加一倍,这是存在于数量的数据结构子列表刚刚从数据结构中删除)。 O(log m)
  4. If the size of the semi-sorted sublist is greater than K, use binary search to find the kth value.如果半排序子列表的大小大于 K,则使用二分查找找到第 k 个值。 O(log N)) . O(log N)) If there are any sublists remaining in the data structure, where the min value is less than k.如果数据结构中还有任何子列表,其中最小值小于 k。 Goto step 1 with the lists as inputs and the new K being k - (size of semi-sorted list) .使用列表作为输入转到第 1 步,新的Kk - (size of semi-sorted list)
  5. If the size of the semi-sorted sublist is equal to K, return the last value in the semi-sorted sublist, this is the Kth value.如果半排序子列表的大小等于K,则返回半排序子列表中的最后一个值,这是第K个值。
  6. If the size of the semi-sorted sublist is less than K, go back to step 2.如果半排序子列表的大小小于 K,则返回步骤 2。

As for performance.至于性能。 Let's see here:让我们看看这里:

  • Takes O(m log m) to add the initial values to the data structure.花费O(m log m)将初始值添加到数据结构中。
  • It needs to consider at most O(m) sublists each requiring O(log n) time for `O(m log n).它最多需要考虑O(m)个子列表,每个子列表都需要O(log n)时间来实现`O(m log n)。
  • It needs perform a binary search at the end, O(log m) , it may need to reduce the problem into a recursive sublists if there is uncertainty about what the value of K is (Step 4), but I don't think that'll affect the big O. Edit: I believe this just adds another O(mlog(n)) in the worst case, which has no affect on the Big O.它需要在最后执行二分搜索O(log m) ,如果不确定 K 的值是什么(步骤 4),它可能需要将问题简化为递归子列表,但我不认为'会影响大 O。编辑:我相信这在最坏的情况下只会增加另一个O(mlog(n)) ,这对大 O 没有影响。

So looks like it's O(mlog(m) + mlog(n)) or simply O(mlog(mn)) .所以看起来它是O(mlog(m) + mlog(n))或简单的O(mlog(mn))

As an optimization, if K is above NM/2 consider the max value when you consider the min value and the min value when you would consider the max value.作为优化,如果 K 高于NM/2考虑最小值时考虑最大值,在考虑最大值时考虑最小值。 This will greatly increase the performance when K is close to NM .当 K 接近NM时,这将大大提高性能。

The answers by btilly and Nuclearman provide two different approaches, a kind of binary search and a k-way merge of the rows. btillyNuclearman的答案提供了两种不同的方法,一种二进制搜索和行的k 路合并

My proposal is to combine both methods.我的建议是结合这两种方法。

  • If k is small (let's say less than M times 2 or 3) or big (for simmetry, close to N x M ) enough, find the k th element with a M-way merge of the rows.如果k足够小(比方说小于M乘以 2 或 3)或足够大(对于对称,接近N x M ),请找到具有 M 行合并的k元素。 Of course, we shouldn't merge all the elements, just the first k .当然,我们不应该合并所有元素,只合并第一个k

  • Otherwise, start inspecting the first and the last column of the matrix in order to find the minimum (witch is in the first column) and the maximum (in the last column) values.否则,开始检查矩阵的第一列和最后一列,以找到最小值(女巫在第一列)和最大值(在最后一列)。

  • Estimate a first pivotal value as a linear combination of those two values.将第一个关键值估计为这两个值的线性组合。 Something like pivot = min + k * (max - min) / (N * M) .类似于pivot = min + k * (max - min) / (N * M)

  • Perform a binary search in each row to determine the last element (the closer) not greater than the pivot.在每一行中执行二分搜索以确定不大于枢轴的最后一个元素(更接近的元素)。 The number of elements less than or equal to the pivot is simply deduced.小于或等于主元的元素数量是简单推导出来的。 Comparing the sum of those with k will tell if the chosen pivot value is too big or too small and let us modify it accordingly.将那些与k的总和进行比较将判断选择的枢轴值是太大还是太小,让我们相应地修改它。 Keep track of the maximum value between all the rows, it may be the kth-element or just used to evaluate the next pivot.跟踪所有行之间的最大值,它可能是第 k 个元素或仅用于评估下一个枢轴。 If we consider said sum as a function of the pivot, the numeric problem is now to find the zero of sum(pivot) - k , which is a monotonic (discrete) function.如果我们将所述总和视为主元的函数,那么现在的数字问题是找到sum(pivot) - k的零,这是一个单调(离散)函数。 At worst, we can use the bisection method (logarithmic complexity) or the secant method.在最坏的情况下,我们可以使用二分法(对数复杂度)或割线法。

  • We can ideally partition each row in three ranges:我们可以理想地将每一行划分为三个范围:

    • At the left, the elements whitch are surely less than or equal to the k th element.在左边,肯定小于或等于第k元素的元素。
    • In the middle, the undeterminated range.在中间,未确定的范围。
    • At the right, the elements whitch are surely greater than the k th element.在右边,肯定大于第k元素的元素。
  • The undeterminate range will reduce at every iteration, eventually becoming empty for most rows.不确定范围将在每次迭代时减少,最终大多数行变为空。 At some point, the number of elements still in the undeterminated ranges, scattered throughout the matrix, will be small enough to resort to a single M-way merge of those ranges.在某些时候,仍然在未确定范围内、散布在整个矩阵中的元素数量将小到足以诉诸这些范围的单个 M 路合并。

  • If we consider the time complexity of a single iteration as O(MlogN) , or M binary searches, we need to multiply it by the number of iterations required for the pivot to converge to the value of the k th -element, which could be O(logNM) .如果我们将单次迭代的时间复杂度视为O(MlogN)M 个二分搜索,我们需要将其乘以枢轴收敛到第k元素的值所需的迭代次数,这可以是O(logNM) This sum up to O(MlogNlogM) or O(MlogNlogN) , if N > M .如果N > M ,则总和为O(MlogNlogM)O(MlogNlogN)

  • Note that, if the algorithm is used to find the median, with the M-way merge as last step is easy to find the ( k + 1) th -element too.请注意,如果该算法用于查找中值,则将 M 路合并作为最后一步也很容易找到第 ( k + 1)元素。

May be I am missing something but If your NxM matrix A have M rows are already sorted ascending with no repetition of elements then k -th smallest value of row is just picking k -th element from row which is O(1) .可能是我遗漏了一些东西,但是如果你的NxM矩阵AM行已经按升序排序,没有元素重复,那么第k行的最小值只是从O(1)行中选择第k个元素。 To move to 2D you just select the k -th column instead, sort it ascending O(M.log(M)) and again pick k-th element leading to O(N.log(N)) .要移动到 2D,您只需选择第k列,将其升序排序O(M.log(M))并再次选择导致O(N.log(N)) k-th元素。

  1. lets have matrix A[N][M]让矩阵A[N][M]

    where elements are A[column][row]其中元素是A[column][row]

  2. sort k-th column of A ascending O(M.log(M))排序A升序k-thO(M.log(M))

    so sort A[k][i] where i = { 1,2,3,...M } ascending所以排序A[k][i]其中i = { 1,2,3,...M }升序

  3. pick A[k][k] as the result选择A[k][k]作为结果

In case you want k-th smallest of all the elements in A instead then You need to exploit the already sorted rows in form similar to merge sort.如果您想要A中所有元素的第 k 个最小,那么您需要以类似于合并排序的形式利用已经排序的行。

  1. create empty list c[] for holding k smallest values创建空列表c[]以保存k最小值

  2. process columns工艺柱

  3. create temp array b[]创建临时数组b[]

    which holds the processed column quick sorted ascending O(N.log(N))它保存处理过的列快速排序升序O(N.log(N))

  4. merge c[] and b[] so c[] holds up to k smallest values合并c[]b[]所以c[]最多保存k最小值

    Using temp array d[] will lead to O(k+n)使用临时数组d[]将导致O(k+n)

  5. if during merging was not used any item from b stop processing columns如果在合并期间未使用b任何项目,则停止处理列

    This can be done by adding flag array f which will hold where from b,c the value was taken during the merge and then just checking if any value was taken from b这可以通过添加标志数组f来完成,它将保存在合并期间从b,c的值,然后检查是否从b中获取了任何值

  6. output c[k-1]输出c[k-1]

When put all together the final complexity is O(min(k,M).N.log(N)) if we consider that k is less than M we can rewrite to O(kNlog(N)) otherwise O(MNlog(N)) .综合起来,最终的复杂度是O(min(k,M).N.log(N))如果我们认为k小于M我们可以重写为O(kNlog(N))否则O(MNlog(N)) Also on average the number of columns to iterate will be even less more likely ~(1+(k/N)) so average complexity would be ~O(N.log(N)) but that is just my wild guess which might be wrong.此外,平均而言,要迭代的列数将更不可能~(1+(k/N))所以平均复杂度将是~O(N.log(N))但这只是我的疯狂猜测可能是错误的。

Here small C++/VCL example:这里的小 C++/VCL 示例:

//$$---- Form CPP ----
//---------------------------------------------------------------------------
#include <vcl.h>
#pragma hdrstop
#include "Unit1.h"
#include "sorts.h"
//---------------------------------------------------------------------------
#pragma package(smart_init)
#pragma resource "*.dfm"
TForm1 *Form1;
//---------------------------------------------------------------------------
const int m=10,n=8; int a[m][n],a0[m][n]; // a[col][row]
//---------------------------------------------------------------------------
void generate()
    {
    int i,j,k,ii,jj,d=13,b[m];
    Randomize();
    RandSeed=0x12345678;
    // a,a0 = some distinct pseudorandom values (fully ordered asc)
    for (k=Random(d),j=0;j<n;j++)
     for (i=0;i<m;i++,k+=Random(d)+1)
      { a0[i][j]=k; a[i][j]=k; }
    // schuffle a
    for (j=0;j<n;j++)
     for (i=0;i<m;i++)
        {
        ii=Random(m);
        jj=Random(n);
        k=a[i][j]; a[i][j]=a[ii][jj]; a[ii][jj]=k;
        }
    // sort rows asc
    for (j=0;j<n;j++)
        {
        for (i=0;i<m;i++) b[i]=a[i][j];
        sort_asc_quick(b,m);
        for (i=0;i<m;i++) a[i][j]=b[i];
        }

    }
//---------------------------------------------------------------------------
int kmin(int k) // k-th min from a[m][n] where a rows are already sorted
    {
    int i,j,bi,ci,di,b[n],*c,*d,*e,*f,cn;
    c=new int[k+k+k]; d=c+k; f=d+k;
    // handle edge cases
    if (m<1) return -1;
    if (k>m*n) return -1;
    if (m==1) return a[0][k];
    // process columns
    for (cn=0,i=0;i<m;i++)
        {
        // b[] = sorted_asc a[i][]
        for (j=0;j<n;j++) b[j]=a[i][j];     // O(n)
        sort_asc_quick(b,n);                // O(n.log(n))
        // c[] = c[] + b[] asc sorted and limited to cn size
        for (bi=0,ci=0,di=0;;)              // O(k+n)
            {
                 if ((ci>=cn)&&(bi>=n)) break;
            else if (ci>=cn)     { d[di]=b[bi]; f[di]=1; bi++; di++; }
            else if (bi>= n)     { d[di]=c[ci]; f[di]=0; ci++; di++; }
            else if (b[bi]<c[ci]){ d[di]=b[bi]; f[di]=1; bi++; di++; }
            else                 { d[di]=c[ci]; f[di]=0; ci++; di++; }
            if (di>k) di=k;
            }
        e=c; c=d; d=e; cn=di;
        for (ci=0,j=0;j<cn;j++) ci|=f[j];   // O(k)
        if (!ci) break;
        }
    k=c[k-1];
    delete[] c;
    return k;
    }
//---------------------------------------------------------------------------
__fastcall TForm1::TForm1(TComponent* Owner):TForm(Owner)
    {
    int i,j,k;
    AnsiString txt="";

    generate();

    txt+="a0[][]\r\n";
    for (j=0;j<n;j++,txt+="\r\n")
     for (i=0;i<m;i++) txt+=AnsiString().sprintf("%4i ",a0[i][j]);

    txt+="\r\na[][]\r\n";
    for (j=0;j<n;j++,txt+="\r\n")
     for (i=0;i<m;i++) txt+=AnsiString().sprintf("%4i ",a[i][j]);

    k=20;
    txt+=AnsiString().sprintf("\r\n%ith smallest from a0 = %4i\r\n",k,a0[(k-1)%m][(k-1)/m]);
    txt+=AnsiString().sprintf("\r\n%ith smallest from a  = %4i\r\n",k,kmin(k));

    mm_log->Lines->Add(txt);
    }
//-------------------------------------------------------------------------

Just ignore the VCL stuff.忽略 VCL 的东西。 Function generate computes a0, a matrices where a0 is fully sorted and a has only rows sorted and all values are distinct.函数 generate 计算a0, a矩阵,其中a0是完全排序的, a只对行进行排序并且所有值都是不同的。 The function kmin is the algo described above returning k-th smallest value from a[m][n] For sorting I used this:函数kmin是上述算法,从a[m][n]返回第 k 个最小值用于排序,我使用了这个:

template <class T> void sort_asc_quick(T *a,int n)
    {
    int i,j; T a0,a1,p;
    if (n<=1) return;                                   // stop recursion
    if (n==2)                                           // edge case
        {
        a0=a[0];
        a1=a[1];
        if (a0>a1) { a[0]=a1; a[1]=a0; }                // condition
        return;
        }
    for (a0=a1=a[0],i=0;i<n;i++)                        // pivot = midle (should be median)
        {
        p=a[i];
        if (a0>p) a0=p;
        if (a1<p) a1=p;
        } if (a0==a1) return; p=(a0+a1+1)/2;            // if the same values stop
    if (a0==p) p++;
    for (i=0,j=n-1;i<=j;)                               // regroup
        {
        a0=a[i];
        if (a0<p) i++; else { a[i]=a[j]; a[j]=a0; j--; }// condition
        }
    sort_asc_quick(a  ,  i);                            // recursion a[]<=p
    sort_asc_quick(a+i,n-i);                            // recursion a[]> p
    }

And Here the output:这里的输出:

a0[][]
  10   17   29   42   54   66   74   85   90  102 
 112  114  123  129  142  145  146  150  157  161 
 166  176  184  191  195  205  213  216  222  224 
 226  237  245  252  264  273  285  290  291  296 
 309  317  327  334  336  349  361  370  381  390 
 397  398  401  411  422  426  435  446  452  462 
 466  477  484  496  505  515  522  524  525  530 
 542  545  548  553  555  560  563  576  588  590 

a[][]
 114  142  176  264  285  317  327  422  435  466 
 166  336  349  381  452  477  515  530  542  553 
 157  184  252  273  291  334  446  524  545  563 
  17  145  150  237  245  290  370  397  484  576 
  42  129  195  205  216  309  398  411  505  560 
  10  102  123  213  222  224  226  390  496  555 
  29   74   85  146  191  361  426  462  525  590 
  54   66   90  112  161  296  401  522  548  588 

20th smallest from a0 =  161

20th smallest from a  =  161

This example iterated only 5 columns...这个例子只迭代了 5 列......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM