简体   繁体   English

Mergesort - 自上而下快于自上而下吗?

[英]Mergesort - Is Bottom-Up faster than Top-Down?

I've been reading "Algorithms, 4th Ed" by Sedgewick & Wayne, and along the way I've been implementing the algorithms discussed in JavaScript. 我一直在阅读Sedgewick和Wayne的“Algorithms,4th Ed”,并且我一直在实现JavaScript中讨论的算法。

I recently took the mergesort examples provided in the book to compare top-down and bottom-up approaches... but I'm finding that bottom-up is running faster (I think). 我最近采用了书中提供的mergesort示例来比较自上而下和自下而上的方法......但我发现自下而上的运行速度更快(我认为)。 See my analysis on my blog. 在我的博客上查看我的分析。 - http://www.akawebdesign.com/2012/04/13/javascript-mergesort-top-down-vs-bottom-up/ - http://www.akawebdesign.com/2012/04/13/javascript-mergesort-top-down-vs-bottom-up/

I have not been able to find any discussion that says one method of mergesort should be faster than the other. 我还没有找到任何讨论说一个mergesort方法应该比另一个快。 Is my implementation (or analysis) flawed? 我的实施(或分析)是否存在缺陷?

Note: my analysis measures the iterative loops of the algorithm, not strictly the array compares/moves. 注意:我的分析测量算法的迭代循环,而不是严格的数组比较/移动。 Perhaps this is flawed or irrelevant? 也许这有缺陷或无关紧要?

EDIT: My analysis didn't actually time the speed, so my statement about it running "faster" is a bit misleading. 编辑: 我的分析实际上没有时间速度,所以我关于它运行“更快”的声明有点误导。 I am tracking the "iterations" through the recursive method (top-down) and the for loops (bottom-up) - and bottom-up appears to use fewer iterations. 我通过递归方法(自上而下)和for循环(自下而上)跟踪“迭代” - 并且自下而上似乎使用更少的迭代。

I have not been able to find any discussion that says one method of mergesort should be faster than the other. 我还没有找到任何讨论说一个mergesort方法应该比另一个快。

Bottom-up and top-down merge sorts, as well as other variants, have been well studied during the 90s. 自上而下和自上而下的合并类别以及其他变体在90年代进行了很好的研究。 In a nutshell, if you measure the cost as the number of comparisons of individual keys, the best costs are the same (~ (n lg n)/2), the worst cost of top-down is lower than or equal to the worst case of bottom-up (but both ~ n lg n) and the average cost of top-down is lower than or equal to the average case of bottom-up (but both ~ n lg n), where "lg n" is the binary logarithm. 简而言之,如果您将成本测量为单个密钥的比较次数,则最佳成本相同(〜(n lg n)/ 2),自上而下的最差成本低于或等于最差成本自下而上的情况(但两者都是n n n)和自上而下的平均成本低于或等于自下而上的平均情况(但都是〜n lg n),其中“lg n”是二进制对数。 The differences stem from the linear terms. 差异源于线性项。 Of course, if n=2^p, the two variants are in fact exactly the same. 当然,如果n = 2 ^ p,则两个变体实际上完全相同。 This means that, comparison-wise, top-down is always better than bottom-up. 这意味着,从比较的角度来看,自上而下总是好于自下而上。 Furthermore, it has been proved that the "half-half" splitting strategy of top-down merge sort is optimal. 此外,已经证明自上而下合并排序的“半”分裂策略是最优的。 The research papers are from Flajolet, Golin, Panny, Prodinger, Chen, Hwang and Sedgewick. 研究论文来自Flajolet,Golin,Panny,Prodinger,Chen,Hwang和Sedgewick。

Here is what I came up in my book Design and Analysis of Purely Functional Programs (College Publications, UK), in Erlang: 以下是我在Erlang中出版的“纯功能程序设计与分析 (英国大学出版物)”一书中提到的内容:

tms([X|T=[_|U]]) -> cutr([X],T,U);
tms(T)           -> T.

cutr(S,[Y|T],[_,_|U]) -> cutr([Y|S],T,U);
cutr(S,    T,      U) -> mrg(tms(S),tms(T)).

mrg(     [],    T)            -> T;
mrg(      S,   [])            -> S;
mrg(S=[X|_],[Y|T]) when X > Y -> [Y|mrg(S,T)];
mrg(  [X|S],    T)            -> [X|mrg(S,T)].

Note that this is not a stable sort. 请注意,这不是一个稳定的排序。 Also, in Erlang (and OCaml), you need to use aliases (ALIAS=...) in the patterns if you want to save memory. 此外,在Erlang(和OCaml)中,如果要节省内存,则需要在模式中使用别名 (ALIAS = ...)。 The trick here is to find the middle of the list without knowing its length. 这里的技巧是在不知道其长度的情况下找到列表的中间部分。 This is done by cutr/3 which handles two pointers to the input list: one is incremented by one and the other by two, so when the second reaches the end, the first one is in the middle. 这是由cutr / 3完成的,它处理两个指向输入列表的指针:一个递增一个而另一个递增两个,所以当第二个到达结尾时,第一个指向中间。 (I learnt this from a paper by Olivier Danvy.) This way, you don't need to keep track of the length and you don't duplicate the cells of the second half of the list, so you only need (1/2)n lg n extra space, versus n lg n. (我是从Olivier Danvy的一篇论文中学到的。)这样,你不需要跟踪长度,也不需要复制列表后半部分的单元格,所以你只需要(1/2) )n lg n额外空间,相对于n lg n。 This is not well known. 这不是众所周知的。

It is often claimed that the bottom-up variant is preferable for functional languages or linked list (Knuth, Panny, Prodinger), but I don't think this is true. 人们常说自下而上的变体更适合函数式语言或链表(Knuth,Panny,Prodinger),但我不认为这是真的。

I was puzzled like you by the lack of discussion on merge sorts, so I did my own research and wrote a large chapter about it. 由于缺乏关于合并类型的讨论,我对你感到困惑,所以我做了自己的研究并写了一篇关于它的大篇章。 I am currently preparing a new edition with more material on merge sorts. 我目前正在准备一个新版本,其中有更多关于合并类型的材料。

By the way, there are other variants: queue merge sort and on-line merge sort (I discuss the latter in my book). 顺便说一下,还有其他变种:队列合并排序和在线合并排序(我在书中讨论后者)。

[EDIT: As the measure for the cost is the number of comparisons, there is no difference between choosing an array versus a linked list. [编辑:由于成本的衡量标准是比较次数,因此选择数组与链表之间没有区别。 Of course, if you implement the top-down variant with linked lists, you have to be clever, as you don't necessarily know the number of keys, but you'll need to traverse a least half the keys, each time, and reallocate, in total (1/2)n lg n cells (if you are clever). 当然,如果您使用链接列表实现自上而下的变体,您必须聪明,因为您不一定知道键的数量,但每次都需要遍历至少一半的键,并且重新分配,总共(1/2)n lg n个细胞(如果你聪明的话)。 Bottom-up merge sort with linked lists actually requires more extra memory, n lg n + n cells. 与链接列表的自下而上合并排序实际上需要更多额外的内存,n lg n + n个单元格。 So, even with linked lists, the top-down variant is the best choice. 因此,即使使用链接列表,自上而下的变体也是最佳选择。 As far as the length of the program goes, your mileage may vary, but in a functional language, top-down merge sort can be made shorter than bottom-up, if stability is not required. 就程序的长度而言,您的里程可能会有所不同,但在功能语言中,如果不需要稳定性,自上而下的合并排序可以比自下而上更短。 There are some papers that discuss implementations issues of merge sort, like in-place (for which you need arrays), or stability etc. For instance, A Meticulous Analysis of Mergesort Programs , by Katajainen and Larsson Traff (1997).] 有些论文讨论了合并排序的实现问题,例如就地(您需要数组)或稳定性等。例如,Katajainen和Larsson Traff(1997)对Mergesort 程序的细致分析

I had asked the same question on coursera class forums for the 2012 August edition of this course . 我在2012年8月版本课程的课程论坛上提出了同样的问题。 Professor Kevin wayne (of Princeton) replied that in many cases recursion is faster than iteration because of caching improved performances. 普林斯顿教授Kevin Wayne回答说,在很多情况下,递归比迭代更快,因为缓存提高了性能。

So the short answer that I got at that time was that top down merge sort will be faster than bottom up merge sort because of caching reasons. 所以我当时得到的简短答案是,由于缓存原因,自顶向下合并排序将比自下而上合并排序更快。

Please note that the class was taught in Java programming language(not Javascript). 请注意,该课程是用Java编程语言(而不是Javascript)教授的。

If by faster you mean fewer "iterations" then yes. 如果越快意味着更少的“迭代”,那么是。 If you're wondering about execution time maybe. 如果你想知道执行时间可能。

The reason is some of those 21,513 iterations are doing more than the 22,527 iterations. 原因是这些21,513次迭代中的一些迭代比22,527次迭代更多。

From looking at the source it seems like some of the leaf nodes in your diagram are being sorted together not individually resulting in fewer merges and sorts but them taking longer. 从查看源代码来看,图表中的某些叶节点似乎不是单独排序,导致合并和排序更少,但需要更长的时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM