简体   繁体   中英

Minimum number of comparisons needed to use merge sort algorithm?

For those of you familiar with merge sort, I'm trying to figure out the minimum number of comparisons needed to merge two subarrays of size n/2, where n is the number of items in the original unsorted array.

I know the average- and worst-case time complexity of the algorithm is O(nlogn), but I can't figure out the exact minimum number of comparisons needed (in terms of n).

The minimum number of comparisons for the merge step is approximately n/2 (which by the way is still O(n) ), assuming a sane implementation once one of the lists has been fully traversed.

For example, if two lists that are effectively already sorted are being merged, then the first member of the larger list is compared n/2 times with the smaller list until it is exhausted; then the larger list can be copied over without further comparisons.

List 1    List 2    Merged List         Last Comparison
[1, 2, 3] [4, 5, 6] []                  N/A
[2, 3]    [4, 5, 6] [1]                 1 < 4
[3]       [4, 5, 6] [1, 2]              2 < 4
[]        [4, 5, 6] [1, 2, 3]           3 < 4
[]        [5, 6]    [1, 2, 3, 4]        N/A
[]        [6]       [1, 2, 3, 4, 5]     N/A
[]        []        [1, 2, 3, 4, 5, 6]  N/A

Note that 3 comparisons were made, with 6 members in the list.

Again, note that the merge step is still effectively considered O(n) even in the best case. The merge sort algorithm has time complexity O(n*lg(n)) because the merge step is O(n) across the whole list, and the divide/merge happens with O(lg(n)) levels of recursion.

This answer gives an exact result, not only the asymptotic behaviour written using some Landau symbol .

Merging lists of lengths m and n takes at least min( m , n ) comparisons. The reason is that you can stop comparing elements only when one of the input lists has been completely processed, ie you'll need to iterate over at least the smaller of the two lists. Note that this number of comparisons will only be sufficient for some inputs, so it is minimal in the sense that it assumes the best case of possible input data. For worst case input, you will find higher numbers, namely n ⌈lg n⌉ − 2⌈lg n⌉ + 1 .

Let n = 2 k be a power of two. Let i be a merge level, with 0 ≤ i < k . At level i you execute 2 ki − 1 merges, each of which requires 2 i comparisons. Multiplying these two numbers gives you 2 k − 1 comparisons, which is equal to n /2. Summing over the k levels of merges you get nk /2 = ( n lg n )/2 comparisons.

Now let n be 1 less than a power of two. Let k = ⌈lg n ⌉ still denote the number of merge levels. Compared to the 2 k case, you now have one less comparison at each level. So the total number of merges reduces by k , resulting in 2 k k /2 − k = (2 k /2 − 1) k comparisons. However, if you remove one more element, leading to n = 2 k − 2, then you won't reduce the number of topmost merges, since the other list already is the shorter one. Which suggests that things might become more difficult around here.

So let's have a little demo program, which we can use both to check our previous result and to compute the number of comparisons for other values:

mc = [0, 0]                                 # dynamic programming, cache previous results
k = 1                                       # ceil(lg n) in the loop
for n in range(2, 128):
    a = n // 2                              # split list near center
    b = n - a                               # compute length of other half list
    mc.append(mc[a] + mc[b] + min(a, b))    # need to sort these and then merge
    if (n & (n - 1)) == 0:                  # if n is a power of two
        assert mc[-1] == n*k/2              # check previous result
        k += 1                              # increment k = ceil(lg n)
print(', '.join(str(m) for m in mc))        # print sequence of comparison counts, starting at n = 0

This gives you the following sequence:

0, 0, 1, 2, 4, 5, 7, 9, 12, 13, 15, 17, 20, 22, 25, 28, 32, 33, 35,
37, 40, 42, 45, 48, 52, 54, 57, 60, 64, 67, 71, 75, 80, 81, 83, 85,
88, 90, 93, 96, 100, 102, 105, 108, 112, 115, 119, 123, 128, 130, 133,
136, 140, 143, 147, 151, 156, 159, 163, 167, 172, 176, 181, 186, 192,
193, 195, 197, 200, 202, 205, 208, 212, 214, 217, 220, 224, 227, 231,
235, 240, 242, 245, 248, 252, 255, 259, 263, 268, 271, 275, 279, 284,
288, 293, 298, 304, 306, 309, 312, 316, 319, 323, 327, 332, 335, 339,
343, 348, 352, 357, 362, 368, 371, 375, 379, 384, 388, 393, 398, 404,
408, 413, 418, 424, 429, 435, 441

which you can look up in the On-Line Encyclopedia of Integer Sequences to find that this sequence describes the total number of 1's in binary expansions of 0, ..., n . There are some formulas there as well, but either they are inexact (involve some Landau symbol term), or they rely on some other non-trivial sequence, or they are pretty complex. The one I like most expresses just what my program above did:

a(0) = 0, a(2n) = a(n)+a(n-1)+n, a(2n+1) = 2a(n)+n+1. - Ralf Stephan, Sep 13 2003

Given these alternatives I guess I'd stick with the above script to compute these numbers. You can remove the assertion and everything related to this, rely on the fact that a < b , and drop the output as well if you include this into a larger program. The result should look like this:

mc = [0, 0]
for n in range(2, 1024):
    a = n // 2
    mc.append(mc[a] + mc[n - a] + a)

Notice that eg for n = 3 you get only two comparisons. Clearly this can only work if you compare both extremal elements to the median one, so that you don't have to compare the extremal ones to one another any more. This illustrates why the above computation only works for best case input. Worst case input would have you computing minimal and maximal element with one another at some point, leading to three comparisons as computed by that n ⌈lg n⌉ − 2⌈lg n⌉ + 1 formula.

For every comparison, you discharge one element from one of the two lists. So the number of comparisons is at most the sum of the lengths of the two lists. As Platinum demonstrates, it may be less if you reach the end of one array and the other still has items in it.

So the number of comparisons is between n/2 and n .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM