简体   繁体   English

使用合并排序算法所需的最少比较次数?

[英]Minimum number of comparisons needed to use merge sort algorithm?

For those of you familiar with merge sort, I'm trying to figure out the minimum number of comparisons needed to merge two subarrays of size n/2, where n is the number of items in the original unsorted array. 对于那些熟悉合并排序的人,我试图找出合并两个大小为n / 2的子数组所需的最小比较数,其中n是原始未排序数组中的项目数。

I know the average- and worst-case time complexity of the algorithm is O(nlogn), but I can't figure out the exact minimum number of comparisons needed (in terms of n). 我知道算法的平均时间和最坏情况下的时间复杂度为O(nlogn),但我无法找出所需的确切最小比较数(以n表示)。

The minimum number of comparisons for the merge step is approximately n/2 (which by the way is still O(n) ), assuming a sane implementation once one of the lists has been fully traversed. 假设一旦完全遍历列表之一,就可以合理地实施合并步骤的最小比较次数约为n/2 (顺便说,仍为O(n) )。

For example, if two lists that are effectively already sorted are being merged, then the first member of the larger list is compared n/2 times with the smaller list until it is exhausted; 例如,如果要合并两个已经有效排序的列表,则将较大列表的第一个成员与较小列表进行n/2次比较,直到用尽为止; then the larger list can be copied over without further comparisons. 那么无需复制即可复制较大的列表。

List 1    List 2    Merged List         Last Comparison
[1, 2, 3] [4, 5, 6] []                  N/A
[2, 3]    [4, 5, 6] [1]                 1 < 4
[3]       [4, 5, 6] [1, 2]              2 < 4
[]        [4, 5, 6] [1, 2, 3]           3 < 4
[]        [5, 6]    [1, 2, 3, 4]        N/A
[]        [6]       [1, 2, 3, 4, 5]     N/A
[]        []        [1, 2, 3, 4, 5, 6]  N/A

Note that 3 comparisons were made, with 6 members in the list. 请注意,进行了3​​个比较,列表中有6个成员。

Again, note that the merge step is still effectively considered O(n) even in the best case. 同样,请注意,即使在最佳情况下,合并步骤仍有效地视为O(n) The merge sort algorithm has time complexity O(n*lg(n)) because the merge step is O(n) across the whole list, and the divide/merge happens with O(lg(n)) levels of recursion. 合并排序算法的时间复杂度为O(n*lg(n))因为合并步骤在整个列表中为O(n) ,并且除法/合并发生在O(lg(n))个递归级别上。

This answer gives an exact result, not only the asymptotic behaviour written using some Landau symbol . 这个答案给出了确切的结果,不仅给出了使用一些Landau符号书写的渐近行为。

Merging lists of lengths m and n takes at least min( m , n ) comparisons. 合并长度为mn的列表至少需要进行min( mn )比较。 The reason is that you can stop comparing elements only when one of the input lists has been completely processed, ie you'll need to iterate over at least the smaller of the two lists. 原因是只有在完全处理完输入列表之一之后,才可以停止比较元素,即,您至少需要遍历两个列表中较小的一个。 Note that this number of comparisons will only be sufficient for some inputs, so it is minimal in the sense that it assumes the best case of possible input data. 请注意,此比较次数仅对某些输入就足够了,因此在假定可能的输入数据为最佳情况下,它是最小的。 For worst case input, you will find higher numbers, namely n ⌈lg n⌉ − 2⌈lg n⌉ + 1 . 对于最坏的情况,您会发现更高的数字,即n⌈lgn⌉−2⌈lgn⌉+1

Let n = 2 k be a power of two. n = 2 k为2的幂。 Let i be a merge level, with 0 ≤ i < k . i是一个合并电平,其中0≤ <K。 At level i you execute 2 ki − 1 merges, each of which requires 2 i comparisons. 在第i级,您执行2 ki − 1个合并,每个合并都需要2 i比较。 Multiplying these two numbers gives you 2 k − 1 comparisons, which is equal to n /2. 将这两个数字相乘得出2 k − 1个比较,等于n / 2。 Summing over the k levels of merges you get nk /2 = ( n lg n )/2 comparisons. 对合并的k个级别求和,将得到nk / 2 =( n lg n )/ 2个比较。

Now let n be 1 less than a power of two. 现在让n小于2的幂。 Let k = ⌈lg n ⌉ still denote the number of merge levels. k = lg n仍然表示合并级别的数量。 Compared to the 2 k case, you now have one less comparison at each level. 与2 k的情况相比,现在每个级别的比较少了一个。 So the total number of merges reduces by k , resulting in 2 k k /2 − k = (2 k /2 − 1) k comparisons. 因此,合并总数减少k ,导致2 k k / 2- k =(2 k / 2-1) k比较。 However, if you remove one more element, leading to n = 2 k − 2, then you won't reduce the number of topmost merges, since the other list already is the shorter one. 但是,如果删除一个元素,导致n = 2 k − 2,则不会减少最上面的合并数,因为另一个列表已经是较短的列表。 Which suggests that things might become more difficult around here. 这表明周围的事情可能会变得更加困难。

So let's have a little demo program, which we can use both to check our previous result and to compute the number of comparisons for other values: 因此,让我们有一个演示程序,我们可以使用它来检查以前的结果并计算其他值的比较次数:

mc = [0, 0]                                 # dynamic programming, cache previous results
k = 1                                       # ceil(lg n) in the loop
for n in range(2, 128):
    a = n // 2                              # split list near center
    b = n - a                               # compute length of other half list
    mc.append(mc[a] + mc[b] + min(a, b))    # need to sort these and then merge
    if (n & (n - 1)) == 0:                  # if n is a power of two
        assert mc[-1] == n*k/2              # check previous result
        k += 1                              # increment k = ceil(lg n)
print(', '.join(str(m) for m in mc))        # print sequence of comparison counts, starting at n = 0

This gives you the following sequence: 这为您提供了以下顺序:

0, 0, 1, 2, 4, 5, 7, 9, 12, 13, 15, 17, 20, 22, 25, 28, 32, 33, 35,
37, 40, 42, 45, 48, 52, 54, 57, 60, 64, 67, 71, 75, 80, 81, 83, 85,
88, 90, 93, 96, 100, 102, 105, 108, 112, 115, 119, 123, 128, 130, 133,
136, 140, 143, 147, 151, 156, 159, 163, 167, 172, 176, 181, 186, 192,
193, 195, 197, 200, 202, 205, 208, 212, 214, 217, 220, 224, 227, 231,
235, 240, 242, 245, 248, 252, 255, 259, 263, 268, 271, 275, 279, 284,
288, 293, 298, 304, 306, 309, 312, 316, 319, 323, 327, 332, 335, 339,
343, 348, 352, 357, 362, 368, 371, 375, 379, 384, 388, 393, 398, 404,
408, 413, 418, 424, 429, 435, 441

which you can look up in the On-Line Encyclopedia of Integer Sequences to find that this sequence describes the total number of 1's in binary expansions of 0, ..., n . 您可以在整数序列在线百科全书中查找该序列,该序列描述二进制扩展为0,...,n的1的总数 There are some formulas there as well, but either they are inexact (involve some Landau symbol term), or they rely on some other non-trivial sequence, or they are pretty complex. 那里也有一些公式,但是它们要么不精确(涉及一些Landau符号术语),要么依赖于其他一些非平凡的序列,或者它们非常复杂。 The one I like most expresses just what my program above did: 我最喜欢的那个表达了我上面的程序所做的事情:

a(0) = 0, a(2n) = a(n)+a(n-1)+n, a(2n+1) = 2a(n)+n+1. a(0)= 0,a(2n)= a(n)+ a(n-1)+ n,a(2n + 1)= 2a(n)+ n + 1。 - Ralf Stephan, Sep 13 2003 -拉尔夫·斯蒂芬(Ralf Stephan),2003年9月13日

Given these alternatives I guess I'd stick with the above script to compute these numbers. 考虑到这些替代方案,我想我会坚持使用上述脚本来计算这些数字。 You can remove the assertion and everything related to this, rely on the fact that a < b , and drop the output as well if you include this into a larger program. 您可以删除断言以及与此相关的所有内容,并依靠a < b的事实,如果将其包含在更大的程序中,则也可以删除输出。 The result should look like this: 结果应如下所示:

mc = [0, 0]
for n in range(2, 1024):
    a = n // 2
    mc.append(mc[a] + mc[n - a] + a)

Notice that eg for n = 3 you get only two comparisons. 注意,例如对于n = 3,您只有两个比较。 Clearly this can only work if you compare both extremal elements to the median one, so that you don't have to compare the extremal ones to one another any more. 显然,只有将两个极值元素都与中值元素进行比较,这才行得通,这样就不必再将极值元素与另一个元素进行比较。 This illustrates why the above computation only works for best case input. 这说明了为什么上述计算仅适用于最佳情况输入。 Worst case input would have you computing minimal and maximal element with one another at some point, leading to three comparisons as computed by that n ⌈lg n⌉ − 2⌈lg n⌉ + 1 formula. 最坏的情况是,您需要在某个点上相互计算最小和最大元素,从而导致由n lglg n − 2 lglg n + 1 +1公式计算得出的三个比较。

For every comparison, you discharge one element from one of the two lists. 对于每次比较,您从两个列表之一中排出一个元素。 So the number of comparisons is at most the sum of the lengths of the two lists. 因此,比较次数最多是两个列表的长度之和。 As Platinum demonstrates, it may be less if you reach the end of one array and the other still has items in it. 正如Platinum展示的,如果到达一个数组的末尾而另一个数组中仍包含项,则可能会更少。

So the number of comparisons is between n/2 and n . 因此比较次数在n/2n

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM