简体   繁体   中英

Mergesort - Is Bottom-Up faster than Top-Down?

I've been reading "Algorithms, 4th Ed" by Sedgewick & Wayne, and along the way I've been implementing the algorithms discussed in JavaScript.

I recently took the mergesort examples provided in the book to compare top-down and bottom-up approaches... but I'm finding that bottom-up is running faster (I think). See my analysis on my blog. - http://www.akawebdesign.com/2012/04/13/javascript-mergesort-top-down-vs-bottom-up/

I have not been able to find any discussion that says one method of mergesort should be faster than the other. Is my implementation (or analysis) flawed?

Note: my analysis measures the iterative loops of the algorithm, not strictly the array compares/moves. Perhaps this is flawed or irrelevant?

EDIT: My analysis didn't actually time the speed, so my statement about it running "faster" is a bit misleading. I am tracking the "iterations" through the recursive method (top-down) and the for loops (bottom-up) - and bottom-up appears to use fewer iterations.

I have not been able to find any discussion that says one method of mergesort should be faster than the other.

Bottom-up and top-down merge sorts, as well as other variants, have been well studied during the 90s. In a nutshell, if you measure the cost as the number of comparisons of individual keys, the best costs are the same (~ (n lg n)/2), the worst cost of top-down is lower than or equal to the worst case of bottom-up (but both ~ n lg n) and the average cost of top-down is lower than or equal to the average case of bottom-up (but both ~ n lg n), where "lg n" is the binary logarithm. The differences stem from the linear terms. Of course, if n=2^p, the two variants are in fact exactly the same. This means that, comparison-wise, top-down is always better than bottom-up. Furthermore, it has been proved that the "half-half" splitting strategy of top-down merge sort is optimal. The research papers are from Flajolet, Golin, Panny, Prodinger, Chen, Hwang and Sedgewick.

Here is what I came up in my book Design and Analysis of Purely Functional Programs (College Publications, UK), in Erlang:

tms([X|T=[_|U]]) -> cutr([X],T,U);
tms(T)           -> T.

cutr(S,[Y|T],[_,_|U]) -> cutr([Y|S],T,U);
cutr(S,    T,      U) -> mrg(tms(S),tms(T)).

mrg(     [],    T)            -> T;
mrg(      S,   [])            -> S;
mrg(S=[X|_],[Y|T]) when X > Y -> [Y|mrg(S,T)];
mrg(  [X|S],    T)            -> [X|mrg(S,T)].

Note that this is not a stable sort. Also, in Erlang (and OCaml), you need to use aliases (ALIAS=...) in the patterns if you want to save memory. The trick here is to find the middle of the list without knowing its length. This is done by cutr/3 which handles two pointers to the input list: one is incremented by one and the other by two, so when the second reaches the end, the first one is in the middle. (I learnt this from a paper by Olivier Danvy.) This way, you don't need to keep track of the length and you don't duplicate the cells of the second half of the list, so you only need (1/2)n lg n extra space, versus n lg n. This is not well known.

It is often claimed that the bottom-up variant is preferable for functional languages or linked list (Knuth, Panny, Prodinger), but I don't think this is true.

I was puzzled like you by the lack of discussion on merge sorts, so I did my own research and wrote a large chapter about it. I am currently preparing a new edition with more material on merge sorts.

By the way, there are other variants: queue merge sort and on-line merge sort (I discuss the latter in my book).

[EDIT: As the measure for the cost is the number of comparisons, there is no difference between choosing an array versus a linked list. Of course, if you implement the top-down variant with linked lists, you have to be clever, as you don't necessarily know the number of keys, but you'll need to traverse a least half the keys, each time, and reallocate, in total (1/2)n lg n cells (if you are clever). Bottom-up merge sort with linked lists actually requires more extra memory, n lg n + n cells. So, even with linked lists, the top-down variant is the best choice. As far as the length of the program goes, your mileage may vary, but in a functional language, top-down merge sort can be made shorter than bottom-up, if stability is not required. There are some papers that discuss implementations issues of merge sort, like in-place (for which you need arrays), or stability etc. For instance, A Meticulous Analysis of Mergesort Programs , by Katajainen and Larsson Traff (1997).]

I had asked the same question on coursera class forums for the 2012 August edition of this course . Professor Kevin wayne (of Princeton) replied that in many cases recursion is faster than iteration because of caching improved performances.

So the short answer that I got at that time was that top down merge sort will be faster than bottom up merge sort because of caching reasons.

Please note that the class was taught in Java programming language(not Javascript).

If by faster you mean fewer "iterations" then yes. If you're wondering about execution time maybe.

The reason is some of those 21,513 iterations are doing more than the 22,527 iterations.

From looking at the source it seems like some of the leaf nodes in your diagram are being sorted together not individually resulting in fewer merges and sorts but them taking longer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM